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ABSTRACT 

The use of explicit, written criteria to evaluate 
college and university faculty is examined. Four major issues 
concerning faculty evaluation are as follows: the desired outcomes of 
faculty evaluation; the functions of faculty activity that are to be 
evaluated; the criteria that should be used for each area evaluated; 
and the procedures for implementing the faculty evaluation program. 
In general, studies identify tv70 major outcomes: personnel decisions 
made regarding promotion, retention, and tenure; and feedback to 
faculty leading to faculty improvement. The major areas to be 
evaluated are teaching, research, and service. For the most part, 
faculty evaluation programs attempt to increase objectivity through 
both qualitative and quantitative approaches. To achieve qualitative 
objectivity, criteria are developed to improve the quality of data 
collected from an individual evaluator. To achieve quantitative 
objectivity, data are collected from multiple data sources. A 
critical issue in faculty evaluation is determining how data are 
collected and reviewed. At some institutions, faculty are expected to 
provide evidence of their teaching, research,, and service 
effectiveness. Since the 1970s, there has been a trend toward 
systematic, standardized data collection. The role played by 
administrators in faculty evaluation is addressed. Meta evaluations 
(i.e./ evaluating the methods of evaluation) conducted by Oregon 
State System of Higher Education, the State University of New York 
Faculty Council of Community Colleges, and the Southern Regional 
Education Board are examined. Appended materials include a form for 
peer review of undergraduate teaching based on dossier materials, 
guidelines for use of results of the student instructional report, 
^=nd a bibliography. (SW) 
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As colleges and universities experience various financial pressui;es and 
face less-than-certain enrollment trends, there is an increased interest in 
the methods and criteria by which faculty performance is assessed. This 
interest has been made evident in the meta-evaluations conducted by 
systems of higher education, more systematic programs of faculty eval- 
uation developed by individual institutions, ^and research studies carried 
out by evaluation specialists. 

Prior to 1970, faculty evaluations.were usually conducted in an infor- 
mal fashion. In recent years, however, more formal evaluation methods 
have been developed and used on an increasingl^O^'idespread basis* The 
utilization of more systematic faculty evaluation methods has been man- 
ifested by the development and use of explicit written criteria. 

In the coming decade, the use of formalized, explicit faculty evaluation 
will become more commonplace due to a low turnover in faculty. While 
institutional growth dwindles or stops, faculty retirement age is being 
extended, and institutions are finding themselves with a large proportion 
of tenured faculty. Thus, colleges and universities must devise systems to 
promote teaching and research excellence, at the same time as they re- 
spond to mounting financial pressures and changes in theeducation "mar- 
ketplace." Furthermore, decisions in the nation's courts compel institutions 
to provide documentation to back up pei'sonnel decisions. In the history 
of American higher education there probably has been no time where the 
internal and external forces have come togexheii^o strohgfy to support a 
formalized system to mt^asurc4he-peirIormance of faculty. 

Thjs.g:<,*5earch-RX»port by l.eal Whitman, director of educational de- 
" velopfnent, Department of Family and Community Medicine at the Uni- 
versity of Utah School of Medicine, and Elaine Weiss, president of 
Educational Dimensions, Inc., Salt Lake City, examines the use of explicit, 
written criteria to evaluate college and university faculty. It also traces 
the use of evaluations in faculty development initiatives and promotion, 
retention, and tenure decisions. It will be of interest to academic admin- 
istrators responsible for conducting faculty evaluations as well as to fac- 
ulty who are the focus of such evaluations. 

Jonathan D. Fife 

Director 

wiwc; Clearinghouse on Higher Education 
The George Washington University 



Overview 



One view of faculty evaluation until the 1970s was that factors other than 
academic merit influenced promotion, retention, and tenure (PRT) deci- 
sions, including the ability to get along and not make waves, A trend of 
the 1970s that has continued into the 1980s has been for colleges and 
universities to develop faculty evaluation programs that are more system- 
atic and comprehensive than those of the past. A particular feature of 
many of these programs is the use of written explicit criteria to evaluate 
faculty. 

One explanation for the attention to faculty evaluation that began in 
the 1970s was changes in the economics of liigher education. During the 
1960s, when many colleges and universities were expanding, it was all 
administrators could do to find and keep faculty. In the 1970s, when 
program retrenchment became a reality, declining enrollments antl fi- 
nancial resources plus increasing costs of operation influenced both ad- 
ministrators and faculty to reconsider policies and procedures for making 
personnel decisions. Related factors that brought attention to faculty eval- 
__^-uat on were faculty demands for a greater share in governance and state 
government demands foi: accountability. 

A manifestation of the increased interest in faculty evaluation was the 
willingness of systems of higher education to conduct "meta evaluations," 
that is, to evaluate their methods of evaluation. Meta evaluations con- 
ducted during the 1970s in three different regions of the country included 
those of the Oregon State System of Higher Education, the State University 
of New York, and the Southern Regional Education Board. A striking 
feature of these meta evaluations was thai common issues were identified. 

One set of issues concerns tlie purpose of faculty evaluation. Although 
many faculty evaluation programs purport to help develop faculty as well 
as to provide data for PRT decisions, often the reality is that faculty 
development Is paid only lip service. Many faculty believe that faculty 
development is important, but see faculty evaluation as a negative process 
unrelated to faculty improvement. Often administrators assume that fac- 
ulty evaluation automatically leads to improvement l)ecausc faculty will 
seek better evaluations. Unfortunately, it does not. Conditions necessary 
to make that connection include trust between administrators and faculty; 
faculty involvement in designing and implementing the evaluation pro- 
gram; and educational resources, such as consultation, toaccompany eval- 
uation results. Creating these conditions is desirable because it is elTicient 
to use one system of»data collection to support faculty evaluation and 
development programs. 

A second set of issues concerns the areas io be evaluated. Traditionally, 
teaching, research, and service are the three areas evaluated; usually, 
however, little weight is given to service. On the other hand, teaching and 
research often are seen as competing obligations. A disturbing finding of 
some studies is that there is wide disagreement within institutions and, 
sometimes, even within academic departments, concerning the weights 
that are given to teaching, research, and service. A frustrating finding is 
that, although many administrators and faculty would^like to give more 
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weight to teaching, the state of the art of evahuuing teaching does not 
instill confidetice in the reliability or validity of teacher evaluation. 

A third vset of issues concerns ihc criteria (ou! suvulards used to evaluate 
faculty, The literature of the 1970s and early 1 980s reveals strong concerns 
over the objectivitv of faculty evaluation, especially in the area of teaching. 
Onea[)proacli to promoting the goal of objectivitv is qualitative, i.e., tech- 
^ifques are used to reduce the bias of un individual e valuator. A second 
approach is quantitative, i.e., data are collected from manv .sources so 
that the evaluation does not depend on a single person. 

The most common strategy of tlie qualitative appro acli is to provide 
those doing the evaluating and those being evaluated with written explicit 
criteria and .standards. The reasoning for providing evaluatoi*s with ex- 
plicit criteria is that they will focus on the important elements ot teaching, 
research, and service. The reasoning for providing those being evaluated 
with explicit criteria is that the rules of fair play dictate giving everyone 
an equal opportunity to succeed. 

A controversial criterion in evaluating the area of teaching is student 
learning. Proponents believe that student learning is the ulu mule evidence 
of effective teaching. Opponents argue that effective teaching and student 
learning are not necessarily associattxL Ceriamly, much is yet to be learned 
about the relationship between teaching and learning. The effort to use 
student learning as one of may criteria to evaluate leaching will increase 
our understanding of this relationship. 

The quantitative approach to objectivity requires using multiple xsources 
of data to evaluate faculty. If there exists one conventional wisdom in the 
field of faculty evaluation it is that using multiple data sources is desirable. 
Students have been the most studied .source of data. Student rating forn)S 
are commonly used to evaluate faculty, and many studies indicate that 
students constitute a reliable data base. Some bias has been found in 
student ratings, but not enough to invalidate them. The real validity is.sue 
is whether the items placed on student rating forms really characterize 
effective teaching. 

Peer evaluation has been insufficiently sludied.and there is a lack of 
undei'standing of how colleagues are used and should be used. A contro- 
versial feature of peer review concerns the use of classroom visitation 
versus review of teaching materinls.Some faculty perceive classiDom visits 
as threatening, negative, and unreliable. Other faculty believe that re- 
viewing leaching materials, is too removed from the act of leaching. 

The use of self-evaluation is even less studied than the u.se of peer 
review. Documentation by faculty of their teaching, research, and service 
activities seems to hold potential as an additional source of data. The use 
of teacher dossiers probably will bc\:ome more common in faculty eval- 
uation programs. 

Administrators remain the principal actors in initiating, developing, 
and implementing faculty evaluation. In most cases, department heads 
and deans use available data rather than collect theirown. A healthy trend 
would be increased involvement of faculty themselves. Faculty judgment 
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in developing nicaningful criteria and standards is essential to adequate 
evaluation. A concern for the authoi-s of this monograph is a '^crisis of 
spirit" created by easy-to-measure criteria that do not reflect the inipor- 
tant qualities of teaching, research, and service. 

The fourth set of issues concerns the admmLstrative procedures used to 
evaluate faculty»The treud is for institutions rather than individual faculty 
to become ivsponsible for collecting data on faculty performance. Once 
data are collected, administrators also donunate the review process. How- 
ever, faculty desire for shared governance and faculty unions' attempts lo 
reduce the power of administratoi-s presage an inca^ase in faculty partic- 
ipation. 

Another influence on administrative procedures is the court system. 
In general, courts have reinforced the ivquirenieiu that institutions pro- 
vide written criteria and proccduivs that guarantee due process. Because 
of the increased number of litigations initiatvd by faculty disappointed 
by personnel decisions, it is imperative that administrators keep up to 
date with lega! requirements. 

One noteworthy effort to improve faculty evaluation programs was 
carried out by the Southern Regional Education Board. According to the 
evaluation of their faculty evaluation project, the most important char« 
acteristics for improving faculty evaluation are active support and in« 
volvenient of top-level administixitors plus faculty mvolvement. 

Although the economic factoi-s that precipitated the examination of 
faculty evaluation may change, theix* is little likelihood that there will be 
a return to the informal methods of faculty evaluation that characterized 
the* prc-1970 era. The concept of fair play, iviuforcod by the courts and 
by both administnitors and faculty, dictates that institutions make clear 
the purposes of evaluation and the areas to he evaluated. In particular, 
for the 1980s, one can e.xpc*ct the use of written explicit criteria to be 
studied and more commonly used. 
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Background 



In a summation of his ten -years of lescarcli on tlic sociology of higher 
education. Lionel S. Lewis I'ouncI that tlie botlv of evidence indicated that 
merit was a minor factor in academic advancement. In Scalht^ (he Ivoiy 
Touvr, he contends that factors other than academic performance influ- 
enced academic advancement, including the ability to get uloy^and not 
make waves (1975), ' 

Lewis's view offnculty ad vancemeni in the 1 970s is supported by Pixxlgei-s 
in his plea For more systematic faculty evaluation (1980), According to his 
anecdotal account of hou' assistant professor Z was consideral for pro- 
moi'oii in the mid-1970s, the department chairman and academic vice 
president of the college met to discuss Z's past perfornumce. They knew 
Z had dealt effectively wilh his departmental duties, had worked on oc- 
casion with business and intlustry.and had published two or three i\\ Ucles, 
Weighing Z\s past performance against their vision of an ideal faculiy 
mcmbei\ they concluded thai they *'likcd the cut of his jib" (1980, p, \)\ 
Looking back at these days, ProdgeiN noted \\\ 1980 that, increasingly, 
personnel decisions are no longer based on the "cut of one's jib." Rather, 
the move is toward a systematized and standardized attempt to"mc;Lsurc" 
the quality of faculty performance (1980. p. I). \ 



In reviewing the literature, one finds justification thai Prodgers is cor- 
rect: One of the strongest trends in higher education in the 1970s, espe- 
cially in,the ,sc\:ond half of the decade, was to examine how facuhy were 
cvaluaied. Actually, perhaps "reexamination" would be a moiv accurate 
characterization because, although interest in faculty evaluation in the 
1970s was unprecedented, it was not entirely new. In his study of faculty 
evaluation. Miller found that interest existed in the 1920s and '30s and 
again in the late MQs and early '50s (1974, p. 1), However, compared with 
the.se earlier periods, interest in faculty evaluation in the 1970s was con- 
siderable, particularly in contrast to the 1960s, Millerobserved thai the 
relatively low interest in the 1960s probably was "due to the weahh of 
higher education while expansions in progranis and personnel sought to 
keep pace with growth in enrollment . . ." (1974. p, 1), 

The fact is that, during the expansion years of the 1960s. American 
colleges and universities could "get by" with poorly defined evaluation 
pixKcduivs (Centra 1979). From the college adn)inistrator\s point of view, 
the lack of well defined evaluation procedures was not a pmblem. Rather, 
administrators weiv moiv concerned with finding and keeping faculty 
than with evaluating them (Centra 1979). In her study of community col- 
leger* during this period of growth, nipid enrollments, and new campuses, 
Mark noted. "It was all administrators could do to keep colleges fully 
staffed. The problen) was not evaluation, but finding someone to hire" 
(1977. p. 1). 

Also, from a faculty member's point of view, the lack of weli-uefined 
.evaluation proceduivs was not a problem. As Mark pointed out. "Prior to 
the 1970s many, if not most evaluation systems wciv nolitical. personal, 
subjective and chaotic— largely ignored by faculty so long as it did not 
interfere with their teaching or job securhV" (1977^^. 23). 

' _/ 
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Iti^xynrast to ihc^ctJuipUtCciTcy of the 1960s the use of infynual ap- 
proaches lo fac«lr>^cv;\lualion was tiucsiiorictl in the 1970s (Smith 1976). 
The reexamination of how faenliy were evaluated was brought about pri- 
marily by econonuc ehanges. College adnunistrators and faculty niembers 
became concerned with faeultv evaluation wlieii program retrencht^ieni 
became a reality. Declining enrollments and financial ivsources plus in- 
creasing costs of operations innuenced both adnunistratoi-s aiul facidty to 
reconsider the policies and procedmvs for promotion, retention, and len- 
ua* (PRT). 

In a useful book designed to help administrators and faculty members 
develop and maintain systematic faculty evaluation. Miller (1974) linked 
the increased interest in faculty evaluation to three issues: finance, gov- 
ernance, and accountability. 

Fimuwc: ''Scarcity of resources ineims fewjir iiew positions and some 
existing ones phased out. Making these difficult decisions ivquires a 
broad tiata base, and svstematic faculty evaluation can serve as one 
dath base" (p, 3). 

Covcmaiicc: "The faculty is cfeinanding a givater voice in institutional 
governance, particularly in matters of promotion and tenure. These 
critical questions must be decided on the soundest data base possible, 
including evidence of teaching effectiveness from student ratings" (p, 
3), 

A ccouiuah ility: * * Prec ise accoun i ab i I i ly ivqu iivs some s yst cmat ic mea ns 
of gathering, analyzing, and evaluating data, hence demands for im- 
proved methods of evaluating faculty performance can be expected— 
espoc^ially from state legislators" (p. 3K 

Thus, the incivased interest in faculty evaluation in the 1970s can be 
explained largely by economic factors, and the impetus for nwre system- 
atic evaluation can be closely linked to the issues of finance, governance, 
and accountability. One manifestation of the gaat interest in faculty eval- 
uation during the 1970s was the willingness of,^ym'nw of higher education 
to as.sess their own fd.culty evaluation programs. Daniel L. Stufflcbcain, 
the director of the Evaluation Center at Western Michigan University, 
recognized that, "Qood evaluation requires that evaluation projects them- 
selves be*«?\'aluated" (1978, p. 17). He uses the term "mCta evahiation" to 
refer to evaluation of evaluations^ Three meta evaluations conducted by 
higher 6ducation systems will be briefly reviewed hetv to demonstriue 
evidence of this trend toward self-assessipent ind to identify the major 
issues (hat will be addressed in this a\search ix'port. 
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Three Meta Evaluations 



Oregon State System of Higher Education 

In 1973 the Teaching Research Division of the Oregon State System of 
Higher Education began a three-year study entitled, ''Faculty Teaching: 
Models for Assessment of Quality." An impetus for the study, supported 
by a grant from the Fund for the Improvement of Postsecondary Education, 
was t . recognition that, "Faculty evaluation is an ongoing process even 
if there are no systematic means for making the assessments" (Scott, 
Thorne, and Beaird 1977, p, 1). 

Baseline data were collected in the 1973-74 academic year from four 
discipline areas in member institutions and from cross sections of insti- 
tutions stratified by academic rank. Based on thesV results, a Faculty 
Perception Questionnaire was used in the 1975-76 academic year that 
asked faculty to rate 34 factors in terms of their influence in promoting 
faculty at their institutions, e.g., publications, student ratings, etc. ''Coef- 
ficients of consensus" were derived for each factor based on the proportion 
of respondents from v given group who agreed or disagreed over the level 
of influence. 

Ba.sed on the ratings of influence and coefficients of consensus, the 34 
factors were organized into three clusters: definitely influeniial, dethiitely 
wimfhicntial and ambiguous. The ambiguous cluster identified factors for 
which there was Idw consensus regarding their infiuence. For both college 
imd university faculty, the ambiguous cluster, with 19 factors, was the 
largest. Thirteen factors were commonly ambiguous to both college and 
univei^sity faculty. An example was ''innovative effort in teaching." Six 
factors were ambiguous to one group, but not the other. For example, 
'evidence of student learning in cour.<ies" was ambiguous to college fac- 
ulty, but definitely uninRucntial to university faculty; "supervision of 
theses" was ambiguous tonuiiyersity faculty, but definitely uninRucntial 
to college faculty (Scott, Thorne, and Beaird 1977. pp, 10-^14). 

The research group hypothesized four ci,tx;ufnstances that may explain 
the widespread uncertainty refiected by this high level of ambiguity. In 
their view, the primary source of uncertainty was that faculty were un- 
aware of their institution 's evaluUtion-procedut:es or criteria. A .second 
possible source was a lack of communication within specific departments.- 
A third source was ambiguity of procedures, guidelines, and policies on 
specific campuses. Finally, the authors acknowledged the possibility that 
the questionnaire item could itseif have been ambiguous. Nevertheless, 
they concluded, "The ambiguous group ohf^ctors is, without a doubt, 
unnecessarily large. The size of this.duster results in faculty who are trying 
to be all things to all people"'(Scott, ffjiome. and Beaird'l977, p. 18). 

State University of New York Faculty Council of Community Colleges 

During the summer of 1977 the faculty council sponsored a research proj. 

ect to study the theoretical-foundations asnvcll "asnhe applied-practices^ 

UTfaculty evaluation. According to the study's author, the project was 
prompted by several factors: economic, prograniitiatic. administrative, 
educational, and political (Mark 1977), 
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Mark's review of theoretical models is noteworthy because a major 
criticism faculty evaluation systems is the lack of substantive theory 
on which to base evaluation (Meeth 1976). Administrators whose faculty 
beljeve there is a lack of theory luay find.it helpful to become familiar 
with the major models reviewed by Mark. In genei-al, all these models 
agree that any evaluation system needs to use a variety of data sources 
to effectively differentiate among faculty meiubei-s. After studying the 
current practices in the SUNY systeni. Mark found that* *'AII segments of 
the community college ought to be uu'olved in sonie way with the eval- 
uation process, but with varying and weighted degrees, depending on the 
choice of faculty" (Mark 1977, p. 102). 

To study the current practices, Mark surveyed 30 mstitutions and found 
thai 14 had updated faculty evaluation systems that were written and 
perceived effective by the chief executive oflicei. Based on her in-depth 
study of four of these 14 institutions, Mark found tl^at an adversary re- 
lationship characterized communications between adnunistrators and 
faculty members. Instead, there needed to be "an atmosphere of coopei - 
ation to discuss what evaluative ciiteria to use and how to assess them" 
(1977, p, 108). She emphasized that, ''However a program is evaliinted, 
the key element must be establishing criteria" (1977, p. I ! I), 

Southern Regional Education Board 

In order to study faculty evaluation practices, the Southern Regional Ed- 
ucation Board ^surveyed its 843 postsecondary institutions in 1975 and 
conducted numerous in-depth institutional case studies in 1976 and 1977. 
The economic stimulus for this effort was made clear in the repor t pre- 
pared by the SREB Task Force on Faculty Evaluation and Institutional 
Rewards: "Evaluating faculty performance for purposes of promotion, 
tenure and salary increases is of singular importance today because of 
leveling and declining student enrollments, lack of faculty .mobility and 
increasing financial pressures on institutions" (Moomaw et al. 1977, p. I), 
The purvey of all regional postsecondary institutions yielded 536 usable 
responses, a return rate of 63.6 percent. Institutions were chosen for in- 
depth case studies based on indication in their survey that they had a 
systematic approach to faculty evaluation. Case studies were developed 
fi*om detailed interviews with presidents, deans, department heads, and 
faculty. 

- Based on_the survey and case studies, the task force found that faculty 
evaluation tended to be more systematic at doctoral level institutions 
compared with master's and bachelor's level and two-year colleges. How- 
ever, at all levels, many institutions were vague about precise criteria, 
standards, and evidence to be used. There was strong agreement at all 
types of institutions that imtructional activity was the number one aix:a 
of consideration in twalu However, there was little evidence 

of well-developed procedures. 

In addition, the task foi-ce found that administrators were both the 
main decisipn makers and the main sources of information for these eval- 
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uations. Few institutions used student data in reliable. consist,ent, or com- 
parable ways to make personnel decisions, and even faculty colleague date 
.often were collected only on an informal basis. 



Four Major Issues 



The throe sclf»siiidlos imported hero wore conducted in the mid- 1970s in 
tlitee different regions o) the coiintrv: the Northwest, Northeast, and South, 
A striking feature of these stuthes is their identification of common issues, 
The.se issues will be organized according to the SREB task force framework 
(Moonuiw ct al, 1977) and will'provide the basic structure for the re- 
mainder of this repori. 

Puqyose: What are^ihe desired outames of laadiy evaluation? hi general, 
studies identify tw(^ niajor outcomes: (a) personnel decisions made re- 
garding promotion. Vetention, and tenure; and (b) feedback to facuhy lead- 
ing to faculty improvement. The SRIiB task force found that, although 
most faculty believe that faculty development and improvement should 
be the primary reason for faculty evaluation, few examples could be found 



/ * of institutions using the ivsults ol evaluation for that purpose (Moomaw 



el al. 1977). hi a similar \ein, the SUNY faculty council study found that 
the goal of self'improvement, growth, and development received niuth lip 
service as a "supposedly" important function of the evaluation process; 
however, "in practice, there is little evidence that real and meaningful 
attention is i^aid lo faculty who are in need of help" (Ma^k 1977. p, 98), 
This report will examine the purposes of faculty evaluaiioii and under 
what conditions lacult> e\aUiation can load to facuhy devejopment as 
well as personnel decisions. 

Areas: What Junctions of faculty activity are to he evaluated? Tcixchin^, 
research, and service are coninionly targeted. The Oregon siudy asked, 
*'What weijihts are assigned ... to teaching, scholarship, andk^ vicc . . .? 
Where should a newly appointed faculty member place the majority of 
his energy if he is intent upon attaining a timely promotion?" (Scott, 
Thorne. and Beaird 1977. p. I) The SREB task force observed that, ah 
though administrators say that teaching is the most important area of 
evaluation, pmcedures for evaluating instruction generall\ are poorly de- 
veloped (M.ooniaw et al. 1977). This renorl will examine the weight given 
to areas of evaluation. ^ , ^ 

Criteria: For each area to lie evaluated, what criteria should he used and 
how specific shouhl the criteria he? For each criterion, what are the stan- 
dards of attainment? In the Oregon study, Scott, Thorne, and Beaird found 
a need among all l anks of faculty to understand performance criteria and 
institutional expectations relative to each area of faculty functioning ( 1977). 
Finally, for each standard, what sources of data sliould be u.sed to show 
evidence of attainment? The SREB. task force found that data on which 
judgments are made were not gathered sysfertiatically or consistently 
(Moomaw et al. 1977). A major aini of this.report will be to examine the 
trend toward written explicit criteria. 

Procedures: What is the sequence of activities for iinplemeiitiiij* the faculty 
evaluation proiirain? The SRBB task foFte found that administrators usu- 
ally initiate and carry out faculty evaluation practices with little faculty 
involvement (Moomaw et al. 1977). A lack of faculty involvement was 
noted in the SUN Y-faculty council study; Mark advised that "Faculty niu.st 
be involved ih the development of any process that is to affc^rt their profes- 






sional careers" (1977» p. 10 1). This report will examine the dominant role 
now played by adnunistrators. 

Purposes of Evaluation * 

The abundance of literature on faculty evaluation generally identifies two 
purposes: (1) developing and improving faculty and (2) providing infor- 
mation to make promotion, retention, and tenure (PRT) decisions. How- 
ever, although this two-fold purpose of faculty evaluation is accepted in 
theory, whether boih objectives are metis not so certain (Moomaw et'al. 
1977; Prodgers 1980), One critic of faculty evaluation concluded that cur- 
rent methods and piactices do not serve well the development of faculty 
and the reward of excellence (Fincher 1980), 

One problem with this two-fold purpose of faculty evaluation is the 
perceived inheivnt conflict between fa';ulty development and PRT decision 
making,' For example. Hawley argues that 

if the purpose of the program is to iniptove the (puility of htstruction, 
faculty f}ientl)ers will rightly feel sabotaged when the data are also used 
in making decisions about tenure atid salaries. In the first case, evaluation 
can he seen as helpful; in the second case, lA tnkes on an adversary tone 
(1977, p. 39). 

In theory. it makes sense that, if faculty aix* provided with feedback 
regarding their deficiencies, they will take action to remedy the deficien- 
cies, Howevei',''to support Hawley 's point of view, there is little evidence 
that faculty evaluation improves faculty performance (Rippey 1981). 

Some educators contend that the apparent conflict bet\veen faculty 
development and evaluation is not inherent, Rathei'. the problem has been, 
the failure to recognize that PRT decision making is not an end: It is a 
means to improve instruction and. hence, provide a better education for 
students (Rose 1 976). In a comprehensive study of the relationJjhip between 
faculty development and evaluation. Smith recogni/.ed that some college 
administrators and faculty members believe these two funclions should 
be administered as separate programs. However, he argued that faculty 
development and evaluation should be combined into one program be- 
cause they sharl' a common goal, improvenient of college teaching (Smith 
1976). 

The authors ofahis rese^vr^-^h report believe that the conflict between 
faculty development and evaluation is not inh4;rent. Faculty evaluation 
can serve the dual purposes of faculty improvement and BRT decision 
making if it is accepted that both purposes share the long-range goal of 
Improved instrucs^tion and student learning. Collecting two separate data 
bases for faculty development and faculty evaluation strikes us as ineffi- 
cient and costly— unnecessarily so. However, the fact that there is little 
demonstration of faculty improvement resulting from evaluation must b e 
addressed. The question is. under what conditions can evaluation lead to 
improvement? Fore.xainple.student ratings alonearenot likely to improve 
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Instruction. However, according to a literature review conducted by 
Lcvlnson and Mengcs, seven studies support the contention that a com- 
bniation of student ratings and perbonal consultation (help with inter- 
pretation of ratings, suggestions for improving teaching skills, etc.) favors 
instmctional impiDvement (J979). In addition, Levinson and Mengcs found 
that a combination of self-ratings and student ratings leads to improve- 
ment, especially when student ratings are less positive than self-ratings 
(1979). This condition also was identified by Rippey whose literature re- 
view did not overlap Levinson and Monges. An additional condition iden- 
tified by Rippey was that evalution conducted early in a college course 
favored instructional improvement Jiecausc it allows faculty adequate 
time to make modifications (198)). 

At the University of Utah, Department of Family andCommimity Med- 
icine, conditions of evaluation favorable to improvement deliberately were 
built into the evaluation of clinical teachers in their family practice tcach- 
• ing rounds. Student ratings were iCombined with educational consultation 
and were contrasted to facult\>^elf-ratings; furthermore, data collection 
was begun early enough injJie teaching rounds to give the instructor time 
to make changes in teaplifng style and strategies. In a study of this process, 
Whitman and Schwjjnk found that faculty evaluation led to faculty im- 
provement (1982). 

The debate over the purposes of faculty evaluation will continue into 
the 1980s. Although the need to make PRT decisions based on compre- 
hensive and systematic evalution is undisputed, the rhetoric about "im- 
provemennt of instruction" will continucperhaps without resolution 
(Parramore 1979). Finthermore, there will be an exploration ofother uses 
of faculty evaluation: providing information to students for course selec- 
tion, allocating teaching resources, and research on teaching (Branden- 
burg. Braskxmip.and Ory 1979; Rippey 1981), 

Areas for Evaluation 

In higher education, three areas of faculty performance usually identified 
for evaluation are teaching, research, and service. Service is considered a 
catch-all category and rarely attracts attention as a bone of contention. 
According to a survey of university department heads, public and com- 
munity service is infrequently recognized and rewarded. Moreover, de- 
partment heads did not believe that service should be a major factor in 
" evaluating faculty (Centra 1977, p. 133), 

On the other'hand, teaching and research often are seen as competing 
obligations. Implicit in the teaching-research dichotomy is the widespread 
belief that many faculty relegate teaching to a second-class status because 
research is rewarded in the PRT proc\ess (Jauch 1976). In general, insti- 
tutions vary in the weight each area e.xerts in making PRT decisions (Rip- 
pey 1981). In its "Statement on Teacher Evaluation," the AAUP declares 

that-institutions-should-at-4oast sef forth ispeejfic e.xpectations regarding 

teaching, research, and service (AAUP 1975), In fact, many institutions do 
so. For exaniple, presidents of large universities, especially private ones 
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such as Stanford University and the University of Chicago, have made 
institutional statements favoring leseaich over teaching, whereas small 
colleges emphasi/c teaching (Miller 1974). 

The actual weight given to teaching versus research in a paiticular 
institution often is no\ clear to those who work there. In a study of faculty 
and their department heads at the Universily of Missouri-Columbia, Jauch 
found that two-thirds of the faculty nienibei*s perceived publication to be 
more important than teaching most or all of the tune. On the other hand, 
department heads wei'e evenly divided on the question (1976, p. 9), 

Is the publish or perish threat a l eal one? Lewis claims that except in 
a do/en or so prestigious institutions, the threat is an empty one. Yet, 
many faculty believe that it is difficult to achieve tenure without publi- 
cations (1975). That view was supported by Jauch who foiiiui that, "Ap- 
parently, an individual can be promoted with a good publication record 
even though his adequacy as. a teacher may be in doubt. A good teachei* 
with a poor publication record is at somewhat of a disadvantage" (1976, 
p. 9), Other studies support the contention that many colleges aiid uni- 
versities declaiv teaching to be a high priority, but award tenure and 
promotion largely on the basis ol publication record (Seldin 1975; 
Knapper 1978). 

A reason for not giving more weight to leaching is thai it is difficult 
to evahiate. This view is typified by the statement, "II only you could give 
the promotions committee more data about the candidate's teaching we 
wouhi be ghul to use it" (Rif^pey 1981, *p, 24). 

One prominent critic of how leaching is evaluated is L.^ Richard Meedi, 
who em It led his overview to the Change Report o;/ Teaclmi^: 2, "The State- 
less Art of Teaching Evaluaiion." He commented that. 

Systematic^ compivhenMW, and valid evaluation of teaching has been an 
educational pivhieiii for many years. It continues to evade educators, 
aitJion^li most adiiiiiii.strators and legislators desiie it as a meaningful 
way to deleriiiiiie rewards and Miiictioiis for faculiy, and most serious 
teachers seek it as a way of improving their perforiiiaiicil and more closely 
relating what they do U) what students leariL Most evaluation of teaching 
has resulted in unlair and iiicoiichisive distuictious among teaches with- 
out establishing reliahle or valid relationships between what^ teachers do 
and whai students learn (-1976, p. 3). 

Knapper lound it ironic that, at a time when teaching has assumed 
greater importance fmm the point of view of the students and the com- 
munity-at-large, it has not assumed a.great importance Irom the point of 
view of faculty evaluation (1978). Although student pressures in the 1960s 
helped to stimulate an examination of teaching practices and brought 
about the use of student questionnaires to rate facuhy teaching perfor- 
inance, serious examination of how to evaluate teaching did not follow 
untirfhe r970s'. Specifically, attention was given to est a b Pishing criteria 
10 measure lacuhy performance and standards to judge it. 
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Criteria and Standards 

One manifestation of tlie interest in iaeiilty evaluaiion in tlie 1970s was 
the development of explicit criteria. The need for specific and written 
criteria on which to evaluate faculty was heightened by the scaivity of 
financial rc^sources. A typical connnent was that losing a \aiuuble faculty 
member or keeping an unproductive one were errors of such magnitude 
that it was essential that criteria used in these decisions be as fair and 
explicit as possible. In fact, for criteria to be fair, thev had to be explicit 
(Grinnel and Kyte 1976, p. 44). 

The specific attention to the development of explicit criteria also can 
be explained by additional factors that are related to the "no growth" 
environment. Fii-st, academic unions, which grew in number and strength 
in the 1970s, became concerned with spelling out the conditions under 
which faculty receive tenuiv (Ladd and Lipset 1973). Second, aspiring 
faculty who have been denied promotion have ivturned to the courts for 
redress. In general, the courts expect institutions to publish.criieria for 
making PRT decisions (Centra 1979, p, 141). 

In 1 97 l,\Voiff published a study pf criteria used for faculty promotion 
in college and university speech departments. The study is noteworthy 
because it still reflected the growth period of the 1960s, Wolff mailed a 
faculty questionnaiie on promotion to 200 speech department ehaii per- 
sons of randomly selected colleges and universities. Based on a 58 pei cent 
response rate, she found that criteria for faculty pmmotion in ordei' of 
importance were (1) teaching effectiveness. (2) academic degrees. 
(3) publication, (4) extracurricular speech activities, (5) research, 
(6) connnittee involvement with school development, and (7) scholai ly ac- 
tivities (Wolff 1971), Emblemalie of the primitive state of faculty evalu- 
ation at the time was the general natui*e of these criteria. In fact, thcvse 
are not much more detailed than the three areas of faculty evaluation: 
reseaix:h, teaching, and service. 

Also, the ease of promotion durijig a period of growth was reflected by 
coniments made by departments on the survey (Wolff 1971 . p. 283): 

• "Promotions can be granted to an individual who just stays around 
and does an adequate job." 

• "Tenure is automatic unless teaching elfeetivencss or notorious con- 
duct leads to uncontestable dismi,ssaL" 

• "Our speech department has all top-rankirig faculty." 

If Wolff's study was emblematic of evaluation during the period of 
growth, the study published in 1972 by Schulman and Yrudell symbolized 
the changing environment. In anticipation of a law passed by the Cali- 
fornia Assembly in 1971 requiring public school and community college 
teachei-s be evaluated at least once every two years (Senate Bill 696), the 
Innovations Committee of Los Angeles Pierce College studied guidelines 
for evaluation that would be acceptable to those affected by the new law 
(Schulman and Tmdefl 1972). 



Fiiadty Bvahuitkm ■ 13 




The committee surveyed the literature on evaUiation (they found that 
more dum 2,000 studies of teaeher evaUiation had been made since 1900), 
submitted a pilot questionnaire to instructors and administrators in the 
Los Angeles Community College District, and niailed a revised question- 
naire to instructors and administrators within the 94 public community 
colleges in California, Based on responses froni more than.60 percent of 
the questionnaires, representing about 70 percent of the community col- 
leges, the committee found that criteria for evaluation of teaching were, 
perhaps, the niost troublesome aspects of faculty evaluation (1972, p. 34), 

For example, there was little agreement about how (o measure teaching 
effectiveness with objectivity. The committees recommendation was to 
admit the subjectivity of measuring teaching effectiveness and to select 
criteria that can be utilized in as nonsiibjective a manner as possible. 
According to the committee, examples of specific criteria that described 
teaching effectiveness included (1) ability to relate to students, (2) ability 
to arouse interest, (3) friendliness, (4) empathy, and (5) knowledge of , sub- 
ject matter. To implement the.sc criteria as objectively as possible, the 
committee suggested that classroom visits, if used, should be made by 
Judges who are most competent to determine elfectiveness. for example, 
department or division colleagues. Also, when students are given evalu- 
ation forms to complete on their instructors, they should be instructed as 
to the nature of their task and cautioned again.st emotional judgments, 
pro or con (Schulman and Trudell 1972), 

The problem with subjectivity also was addressed in an evaluation 
plan implemented at New River Community College in Dublin, Virginia 
(McCarter 1974). In describing the New River program. MeCarter stated: 

Frequently, an expressed ^oal of instruaiomil evaluation is to aehieve 
obieetivity during the process. That th is is to any decree passible is at least 
■ a doubtful proposition, liven so, it need not deter a school or college front 
attempting a creditable faculty evaluative system (1974, p, 32), 

To promote tjie goal of objectivity. McCarter ivcommended the u,se of 
collective judgments by students, peers, and supervisors. By using the 
judgments of o^anv, the subjectivitv of the individual would be minimized 
(1974). 

The suggestion in the California study that evaluators should strive to 
be as nonsubjective as possible and the suggestion in the New River plan 
that the collection of judgments be a.s comprehensive as possible represent 
two approaches to dealing with subjectivity in evaluating instruction. 
House characterizes these approaches lo objectivity as qualitative versus 
quantitative. The qualitative sense of objectivity refers to the quality of 
an observation regardless of the number of people making it. Being ob- 
jective means that the observation is factual, but being subjective means 
that is bia,sed. The quantitative .sense of objectivity refers to the number 
of people making the observation. One person's opinion is regarded as 
subjective; whereas, objectivity is achieved through the experience of many 
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observers (House 1980). For the most part, faculty evaluation programs 
attempt to increase objectivity thmugh both qualitative and quantitative 
approaches: To achieve qmliiutive objectivity, criteria are developed to 
improve the quality of data collected from an individual cvaluator. To 
achieve mumtUutivc objectivity, data are collected from nuiltiple data 
sources. 

Qualitative objectivity. Many people can provide data about faculty per- 
formance: students, colleagues, administrators, and facuKy theniselves. 
The major approach to improving the quality of data from these persons 
is to provide them with criteria that are specific and written. The rationale 
is that by providing specific behaviors, features, measures, or indicators 
to be examine^ in the areas of teaching, research, and service, the persons 
doing the evaluating will know what to assess. By providing evahiators 
with criteria, it is hoped that evaluation will be lair, relevant, and ap- 
pro p r i a t e ly foe used . 

This approach has been reinforced by court decisions. For example, in 
the case of Harkless V. Sweeny Independenf School District ol Sweeny, 
Ttwas (1 1 FCP 1005, 1075), it was pointed out that objectivity in faculty 
evaluation could be achieved bv adhering to the following three guidelines 
(Balch 1980): 

/. The Icm^iuige in ihe evaliuuion instruincni used to describe encli char- 
acierisiie io be measured uutst be composed ol words which are reasoiuibly 
precise and uniform in meaning. 

2. There must be a fairly specijic stwulard of measurement to i»uide 
the evaluator in ascribing a particular vahte to a particular character- 
istic, 

3. There must be a reasonably well-defified systaii for assigning relative 
weii;ht to the characteristics measured (p. 4). 

The problem with this approach is that researchers disagree as to 
whether there is a well-defined set of criteria for judging faculty perfor- 
mance (Tuckman and Hagemann 1976), Whereas Johnson and Stafford 
claim that the faculty reward structure is determined by rational criteria 
(1974), others argue that no acceptable criteria have been developed 
(Batista 1976) or that administrators and faculty are using different sets 
of criteria (Meany and Ruetz 1972). 

The difficulty of agreeing on criteria was cited in a study of faculty 
evaluation in Ph.D. graduate departments of sociology. According to 
Gaston, Lantz, and Snyder, "The mle of all criteria for promotion (pub- 
lication, good teaching, and service) remains unclear in the actual pro- 
motional decision" (1975, p. 242). In a study of written criteria used by 
graduate schools of social work during the 1 974-75 academic year, Grinnel 
and Kyte found that, although most schools had carefully defined proce- 
dures to evaluate faculty, they lacked specific, objective criteria on which 
to base these evaluations. However, as evidence of a trend toward the use 
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of written critciin, 64 out of 72 responding scliools reported tliey were 
either in the proeess of working on new written eriteria or antieipated 
doing so, (1 976,. p. 44). 

A major diffieultVrin' developing eriteria is in tlie donuun of judging 
the quality of work. For example, in the area of serviee, it is relatively 
easy to doeument partieipation. However, merely being involved in public 
or eommunity serviee is not a suffieient indieator of effeetiveness (Centra 
1979)^. Similarly, in the area of i-eseaivh, it is relatively easy to establish 
a number of required publieations, however it is diffieult to produce ex- 
plicit criteria to judge the qualitv of published works (Gaston, Lant/,, and 
Snyder 1975). 

There are examples of faculty on promotion committees critically eval- 
uating the quality of published work, but the criteria used were not pre- 
established. For example, a dean's ad hoc tenure review committee at 
Pennsylvania State University, upon denying tenure for a faculty member, 
read the person's published work and fgund two studies "deficient in 
design, methods, implementation, and, in the case of one, in conclusions" 
(Balch 1980, p. 8). 

The need for specific criteria in the area of teaching was recognized 
by Spencer, Crow, and Glass, who reported the work conducted by an ad " 
hoc committee of the Cornell University Medical College Department of 
Psychiatry during the 1977-78 academic year. The authors found two 
major dilficulties with evaluating teaching; (I) the absence of a single, 
concrete end product such as the published results of research and 
(2) problems with reliability and valiclity that seem to accompany any 
attempt to measure teaching effectiveness (1979). Others also cite the 
difficulties colleges and universities have with developing criteria in the 
area of teaching (Meeth 1976; Miller 1974). 

One technique to develop criteria for teaching is to design evaluation 
forms that list the items teachers, students,* and administrators deem 
important. Berk (1979) and Wotruba and Wright (1975) pi^esent easy-to- 
follow methodologies to design such a form, and Fenker describes how a 
teacherevaluation form was designed at Texas Christian University (1 975). 
In addition, Arrcola describes how an instrument developed at Michigan 
State University was adapted by Florida State University (1973), . 

Acqording to a literature review conducted by Dwyer in 1973, teacher 
evaluation forms were being used extensively by colleges and universities 
in the United States as a means to evaluafC teaching effectiveness. How- 
evei", he found that what was lacking was evidence that the characteristics 
listed on these forms made any real difference in the achievement of ed- 
ucational objectives by students (1 973). This criticism of rating forms was 
supported by Meeth: ^ 

better understood how students learn, we lui^lit better understand 
how and what teachers ought to teach. The mans hsts ol teaching activities 
prepared over the years . . . cannot be ranked very conclusively from tnost 
to least important in terms of producing learning (1976, p. 3). 
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In ortltM- to clarify the types ol criteria that could be developed to more 
conclusively evaluate teaching, Meeth (1976) adapted Thorndike's cate- 
gories of criteria to teaching effectiveness, "Ininicdiate" criteria are the 
liiUs of teaching behaviors that ptx)ple believe are related to teachnig 
effectiveness, e.g., k\:tin"e style was conversational, audio-visual aids were 
reinforcing, etc. Although these are better than nothing, Meeth complains 
that immediate y;i(eria are furthest from learning outcomes and are a 
long way from rcHuing teaching to learning. 

Closer to learning outcomes are 'intermediate" criteria, which de- 
scribe the process of teaching: 

• Stitdoits were monvmed to Icani 

• The structure of the letmiiugexperiefice was detetmhied by the i*o(ds of 
the e.xperietice. 

• The content was well ordered, coniprehoisive, and appropriate to the 
abilities of the learners. ^ 

• Rewards and sanctions were appropriate to the ^oals of the leaning 
experietice, 

V • Coals and/or outcomes were clearly specified. 

• Evahiatiofi criteria, sta)idards, afid methodologies were clear atid ^//^ 
propriate to the goals of the experience. 

Methodolo^v was appropriate to the goals of the experietice and the 
abilities of the learners. (Meeth /976, p. 4} 

Closest to learning outcomes aiv ^'ultimate" criteria, which describe 
what students learned: 

• The .students learned what the instuctor was t}yi}ig to teach 

in cogfiitive, affective, andior p.^tychofuotor developmoit 
hi rate andlor absolute achievement, 

• Students retahicd what was leanied. 

• Teacher goals aiid/or outcomes for the learning e.xperience were met. 

• Studetit goals andlor outcomes for the leanihig experience were met. 
(Meeth 1976, p. 4) 

Some educators do not favor using student achievement {ultimate cri- 
teria) as a means to measure teacher effectiveness because of differences 
in the difficulty of instructional objectives, difficulty with measuring some 
* instructional objectives, and the potential abuse by instructors who **teach 
to the test" (McCarter 1974. p. 32). Proponents of using student achieve- 
ment argue that, when only the process of teaching is n^easured (inter- 
mediate criteria), only half the evaluation process is reallv acconiplished 
(Mark 1977, p. 104). 

hicluding student learning asoneof the criteria to be used in evaluating 
teaching is one of the most controversial issues in the field of faculty 
evaluation, Murray acknowledged that to say that the best teacher is one 
whose students learn the most has intuitive appeal. However, he warns 
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that, although it is easy to agree with a statement hke that, it is ahiiost 
impossible to put it into aetion (Murray 1979), The diflieulty of using 
student learning as a criterion to measure teaching efTectiveness also was 
acknowledged by the Special hiterest Group on histriictional (Ivahiatioii 
at the 1977 Annual Meeting ol the American Educational Research As- 
sociation, which rejected its use to evaluate the effectiveness of instruction 
or instructors. (Darr 1977). 

Nevertiieless, including stutlent learning as one ol the criteria to be 
used in evaluating teaching makes sense to the aut hoi's of this report. The 
rationale described by Martin for assessing teaching methodologies aptly 
justifies why it is necessary to seek out how to use student learning as a 
criterion of teaching e fleet ivcness. 

So the teacher dio(K\cs — subject matter, points ol emphasis within the 
discipline, in other words, what will be taii}*ht: the teacher choo\es die 
ntethodolo^y ol this inqttiry, its strate^* and tactics, in other words, how 
to proceed; the teacher chooses the ////////^». the se(pteiices, the specilic 
cliro)iolo^y ol events, in other words, when thin\*s will come to^edier to 
form the basis lor choice: and, liiially, the teacher chooses the j^nt iptes* 
- whv? and so what? These are the questions that lii^ioe in die 

conch IS io}i.\ and iiilereiices for action. 

The teucher chooses and the teaclia acts, and, wvrkiii^ with the stn* 
dent, helps the student develop u capacity lor choice and octiotL Onr 
coniJiiithiciit to this skill, to this seivicc* needs to he kept in mind (i\ we 
assess (he niethodoloi^ies oj the teachiiij^ piotessioii (1 98 1, p. 60). 

Quantitative objectivity. Achieving qualitative objectivity has been dis- 
cussed in terms of providing explicit criteria to those who evaluate faculty. 
Because of the dilficulties with developing valid criteria and doubts over 
the reliability of individual evaluators. multiple soiucCkS of data olten are 
used. Using multiple data sources constitutes a quantitative approach to 
thepixiblein ol objectivity. The Southern Regional I;ducation Board» among 
others, has rocogni/od tliv need for this quantitative approach: 

,*\ :iystein for evaluation (sliouldl nichule provisions for collectiiij^ data 
Ironi many sources and recoiiiiiiei illations from multiple particip(nits, 
sui(4e decisions made even in the 'inost carelullv (onccived^ systems oj 
evaluation will still laifiely denend upon a collection oj snbicctive state- 
iiieiits (Mooniaw et ai 1077, p, 7). 

In fact, if there is one conventional wisdom in the Held of faculty 
evaluation, it is that multiple sources of data are prelen able to one source 
(Batista 1976; Centra 1979; Darr 1977; Goldsehniid 1978; O'Hanlon and 
Mortensen I9B0). Opinions dillerover how to use them. 

Students. There has been a considerable amount of research interest and 
effort regarding the use of student rating forms since the first formal form 
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(the Purdue Rating Scale of Instiuction) was publisliedju 1 926 (Darr 1977). ' 
One reason for studying the use of student ratings la that this method of 
evaluation is used more fra|uently than any other. According to one sur- 
vey, approximately 68 percent of universities in North Aniei ica use studem 
ratings (Bejar 1975). The rationale for using student ratings is that, since 
il is difficult to attribute student learning to the skills of teachers, the next 
best thing is to ask students to rate characteristics of teachers that one 
would logicallv expect to be determinant's of student learning (Murray 
1979). For the most part, faculty niend)ers believe that student ratings 
should be used as one of several sources of iufonnalion in making PRT 
decisions (Goldcnstein and Anderson 1977). ^i." 

One focus of research rt^garding student ratings has been their relia- 
bilitv, To what extent arc ratings consistent or dependable Ibr a giveh 
teacher? One way oi looking at reliability is to study intcr-itcni consis- 
tency, i.e., il six items on a rating lonnnue supposed to measure the>iaa^e 
aspect of teaching, is there a high average correlation anu)ng the six itenis?'*' 
In general, studies of uitcr-itcm consistency demonstrate high average 
correlation coellicients. In other words, il students rate a teacher high on 
one item, tlicv usually will rate \mu or her high on oilier items mtended 
to uieasuiv lluvsanic characteristic (Murray r979, p. 9). 

Another way of looking at reliahilitv is to study inter-rater consistency, 
i.e.. do students agree with one another in the ratings they give a teacher? 
In general, inter-rater reliability is high, particidarly wiien there are 15 
.students or more. With less then 15 students, inter-raicr reliability drops 
off considerably, and. with less than 10 students, it is probably unwise to 
ase .student ratings (Centra 1973), 

A third way oflookiiig at relial^ility is to study tcst-reiest consistency, 
i.e.. are ratings similar at two points in the same course or same type of 
cour.se? In gencrah tcst-retest reliability is high. Teachers who receive a 
high rating in the middii? of a course are likely to receive a high rating at 
the end of the cour,se. Likcwi.se. teachers who receive a high rating in a 
course are likely to receive a high rating when teaching the same or a 
similar course again (Murray 1979. p. 12). 

On the other hand, in general, there is low reliability acro.ss dilferciu 
types ol courses. For example, ratings for teaching a large introductory 
lecture course and for teaching an upper-class .seminar may be mn elated. 
One itnplication of the low correlation of ratings acio.ss different types of 
courses is that student ratings used in PRT decisions require a good sanv 
pling from different types of courses because a high or low rating in one 
lypo of coui^e cannot be used ivliably to judge a facidtv member's teaching 
.skills (Murray 1979, p. 13 J. 

Altliougli most facidty favor inclusion ol studem ratings for faculty 
evaluation, some question the extent to which extraneous course* student, 
and instructor characteristics influence such ratings (Brandcnbing, Bras- 
kam"p,aad Ory 1979). In a rev iew of studies MmTav(1979) found: 

• Students in larger classes gave lower ratings. 
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• Tcaciiers who assign low grades tend to receive lower ratings. 

• Classes that meet at mid-day tend to receive lower ratings. 

• Ratings on days when attendance is very high or very low tend to 
be high. 

According to Murray's analysis of the studies of student bias, bias factors, - > 
although statistically significant; are not large enough to single-handedly 
invalidate student ratings as a measure of teaching effectiveness (1979, p. 
• 24). 

'Jn addition to student bias, another objection to using student, ratings 
in PRT decisions is that they are not valid measures of teach ingieffec- 
tivenc.ss (Brandenburg, Braskamp, and Ory r979). In other words,Von]ie 
teachers with high student ratings may actually be associated with low 
levels of student learning, and other teachers with low student ratings 
may actually be a.ssociated with high levels of students learning! A study 
often cited to support the invalidity of student ratings is known as the 
"Dr, Fox"§tudy. In this study, an actor-was trained to lecture thansifiat- 
ically hut uonsubstantively on a topic he knew nothing about, "Mathe- 
matical Game Theory as Applied to Physician Education." The actor, 
introduced as Dr. Myron L. Fo.\ to a group of psychiatrists, psychologists, 
and social workers, had been coached to-use double talk, non sequitors, 
and contradictory statements. In general, those who attended the live 
lectua* and those who viewed a videotape of the lecture rated Dr. Fox as 
a good teacher: He seemed interested in the subject; he used enough ex- 
amples to clarify his material; he presented the material in a well orga- 
- nized^form; he stimulated thinking, etc, (Naftulin, Ware, and Donnelly 
1973). 

One conclusion of this study is that students, even if they are profes- 
sional educators, can be ".seduced" into thinking that a teacher is good. 
This "Dr. Fox effect" has been replicated in a scries of similar stutfies 
(Ware and Williams 1975; Kane and Schorow 1977; Ramagli and Green- 
w.ood 1980). The counter-argument to the "Dr. Fox effect" is that , although 
student ratings are not sensitive to content differences under some cir- 
cumstances, such as a single-episode guest speaker, real teachers in real 
classrooms cannot fool students into thinking they have learned when 
they have not. 

The fact that student ratings can be biased by irrelevant factors and 
that student ratings are not consistently correlated to actual learning 
highlights the need for multiple data sources. In recommending v;hy stu- 
dent ratings should be used in conjunction with other measures. Shcehan 
pointed out that 

Admhiistratprs should }mka use of this mfomiatiou without forgetting 
that classificatory errors can result because of the imperfect validity of 
' - the ratings. Until instmmoucitioli is improved, the strategy of adminis- 
trators should he one of collecting as much infomiation from as many 
sources as possible (1975, p. 697). 
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Faculty colleagues. At most colleges and universities, colleagues play an 
important role in evaluating faculty. In particular, faculty on promotion 
conmiitttvs make judgments regarding the quantity and quality of re- 
search and service. In addition, faculty are asked to rate each other's 
teaching based on classroom visits and review of teachuig materials (Cen- 
tra 1979), However, in contrast to the literature on student ratings of 
faculty, there seems to be a dearth of literature on the use of coileague 
evaluation (Batista 1976; Darr 1977), Most studies of colleague evaluation 
compare their ratings of teaching to those made by students or admin- 
istrators. For example, Blackburn and Clark studied the colleague, student, 
and administrator ratings of teacher effectiveness for 45 faculty meniberii 
in a midwest college and found that the ratings were significantly cor- 
related (1975, p. 247), Similar results had been documented bv Murray 
(1972). 

For)he most part, the method of colleague evaluation issimilar to that 
used for student evaluation: Faculty are rated on items deemed important 
to good teaching. In fact, Nadeau ( 1 977) suggested the students and faculty 
u.se the .same rating forms, Hildebrand, WiLson, and Dienst (1971) de- 
soiibed how faculty developed their own rating form, 

-Although less is known about colleague evaluation than student eval- 
uation, it is suspected that there are problems with the reliability of peer 
evaluators. Often, colleagues base their judgments about the quality of 
teaching, research, and .service on overall impressions rather than direct . 
observation. Furthermore, these impressions may be biased by depart- 
mental jealou.sies and. rivalries (Bati,sta 1976, p. 261). The importance of 
getting along and not making waves was one of Lewis's major themes in 
Scalhig the Ivory 7'ouV(1975), and the piusence of bias was undei^cored 
by Mark in her SUNY ca.se study: "Personal biases are present and must 
be understood and not allowed to influence the evaluation of a colleague 
whose style, philosophy and manner of prcsent^uion diffei-s from the e val- 
uatoi-s"'(1977, p. 102). 

In addition to problems with reliability, colleague evaluation is com- 
promi.sed by the .same problems with validity experienced with student 
evaluation: Popularity with students and peers is not necessarily ivlated 
to good teaching, and high ratings are not necessarily associated with 
learning outcomes. The main problem perceived by Batista is that research 
in the area of colleague evaluation has not been dealt with systematically. 
He has two recommei)dations for icsearch: (l)to develop adequate in- 
struments and (2) to study interaction between the characteristics of the 
evaluator and the characteristics of the pei"son being evaluated (Batis 4 
1976, p. 264). • , 

The lack of understanding of how colleague evaluation works and should 
work was also cited by French-La/ovik (1981), who warned that, bcx:ause 
considerable progress was made during the 1970s to improve the quality 
of student data, college administrators may believe that other data on 
teaching effectiveness are not needed. In one of the most detailed descrip- 
tions of how colleague evaluation should work, .she points out that faculty 
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peer review is an essential data base beeause (acuity peei-s are uniquely 
qualiried to judge the sub'stance of teaching. 

The approach recommended by Fivnch-La/ovik is for faculty to main- 
tain dossiers on their teaching, reseai*ch, and service. For example, a dos- 
sier on teaching should include a brief and objective description of each 
course taught, its objectives, enrollment, credit hours, etc. In a valuable 
contribution to the field of colleague evaluation, French-Lazovik has de- 
signed a form (see Appendix A) for peer evaluators to use when studying 
(acuity dossiers. 

The thrust of French-La /ovik's position is that there is a need to eval- 
uate aspects of teaching that can only be judged by other faculty. Referring 
back to Meeth's adiiptation of Thorndike's categories of criteria, perhaps 
students are a suitable data base to evaluate immediate criteria, i.c„ to 
assess teaching behaviors that are believed to be related to teaching ef- 
fectiveness. On the other hand, perhaps faculty peers are a suitable data 
base to evaluate intermediate criteria, i,e,, to assess the process of teaching. 

The need ^o make better use of peer evaluation was emphasized by 
Batista (1976), who pointed out that colleagues are in better position to 
evaluate certain faculty behaviors than are students or administrators. 
Teacher behaviors that Batista contends cannot be validly evaluated by 
students or administrators include: 



/, Up'tO'date hiowledge ofsiihjeci matter. 

2, Quality of research, 

3. Quality of puhlicatious and papers. 

4, Kfiowledge of what tiutst he taught. 

5. Knowledge and application of the tuost appropriate or tiiost adequate 
methodology for teaching specific content areas, 

6. Kfiowledge and application of adequate evaluative techniques for the 
objectives of his/her course(s). 

7, Professional behavior according to current ethical standards. 
' 8, Institutional andxoniniiinity sen^ices. 

9. Personal and pipfessional attributes, 

10. Attitude toward and commitment to colleagues, students, and the 
institution (p. 269). 

Thus, although there is a need to know more about how to reliably use 
peer review, a more systematic utilization of colleague evaluations ulti- 
mately will provide a^more valid evaluation of faculty. 

Selfevahiation. In comparison with student ratings, there has been little 
study of peer evaluation. However, there has been even less wi itten about 
self-evaluation (Darr 1977). One approach to self-evaluation is for faculty 
to rate themselves on written scales similar or identical to those used by 
students. According to Blackburn and Clark (1975), there is a low corre- 
lation between student ratings and self-ratings; in general, faculty rate 
themselves higher. Consequently self-ratings rarely are used for PRT de- 
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cision making. However, self-ratings are recommended for the purpose of 
impiovciuent. For example, Whitman recommends using a discrepancy 
between student ratings and self-ratings as a kernel of a problem for the 
teacher to solve, leading to non random attempts to improve instruction 
(1981). 

In addition to completing rating forms, another approach to sel reval- 
uation is for faculty to describe their academic efforts. For example, with 
regatxl to teaching efforts, self-evaluation would include a description of 
the faculty members approaches to teaching, problems with teaching, 
and efforts to improve. We recommend that '^eflort to improve teaching" 
he included as a criterion of teaching effectiveness. Thus, documentation 
of how faculty assess their own needs and implement plans to meet these 
needs could be used as a souae of data in evaluating faculty. For example, 
as a "principle of sound evaluation " O'Hanlon and Mortensen suggest 
that: 

The total evaliiation of f acuity nwnibers should uwludc toiisideratiou of 
what they are doi^ig for their own de\*elopnieut, including attendance at 
workshops, redevelopmeiu of teaching niateruds, trynig new approaches^ 
and seeking help from colleagues aiul nistructioiud consultants. These 
considerations should niclude how the teacher is profit hig frotn evalua- 
tions received from students and others (1 980, 666), 

In addition to documenting improvement of teaching, we recommend 
documenting re.seaa-h and service impmvemcnt. We justify this data .source 
on the basis that* by definition, a good faculty member is one \vh<; .seeks 
to improve performance ol teaching, research, and service at any current 
level of performance. 



Administrative ev(duation. Virtually all faculty evaluations conducted for 
the purpose of PRT decision making use evaluation by administrators, 
since department heads and deans are u.sually involved in making per- 
sonnel decisions. However, rather than generating their own data, ad« 
ministrators tend to evaluate faculty based on student anu colleague data 
sources already collected (Darr 1977). When it is possible to judge from 
research studies, ratings by administrators tend to be the same as ratings 
by colleagues (Erick.sen and Kulik 1974, p. 3). 

Because of the time involved in becoming familiar with an individual 
faculty member's teaching, a\seareh, and service efforts, i* is unlikely that 
administrators will personally evaluate faculty except in small institutions 
(barr 1977; 0*Hanlon and Mortensen 1980). Thus, most attention to ad- 
ministrative evaluation is placed on//(ni'admmistratorsuseavaihible data 
sources. For ^'x am pie, at Franklin and Marshall, a small Uberal arts college 
in Lancaster, Pennsylvania, a faculty member's department head and dean 
evaluate teaching by reviewing (1) evaluations from all students in all 
the teacher's courses, (2) exit interviews with department seniors, 
(3) "grapevine" feedback from students, (4) course syllabi, end, some- 
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limes, (5) observations of classroom teaching. Based on tliese data sourees, 
the department head and dean indcpendentiv rate the faeuhy member on 
an ordinal seale: 



• Raic a 0 on this vriteriou if\ oh tliv basis of the eviilenve, it can he said 
that the faculty member was below average on all measures and counts, 

• Rate a I on this criterion if, on the basis of the evidence, it can he said 
that the faculty member was below average, taking all counts and measures 
as a whole even though on some measures or coutus he or she may have 
been above averai^e, 

• Rate a 2 on this criterion if on the basis of the evidence, it can be said 
that the faculty member was average, takitig all couiUs and measures as 
a whole. 

• Rate a 3 on this criterion if, on the basis of the evidence, it can he said 
that the faculty member was above average, taking all counts and measures 
as a whole. 

• Rate a 4 on this criterion if, on the basis of the evidence, it can he sa id 
that the faculty member was above average on all comas and measures, 

• Rate a 5 on this criterion if on the basis of the evidence, it can he said 
that die faculty member was clearly excellent on all counts and measures 
(Michalak and Friedrieh 198 1, pp' 586'-87), 

After the department head and dean eonfer and rceoneile atiy differ- 
enees, the ratings aru reported to the faeulty member, who can appeal a 
rating to the dean. The dean makes the final decision, \. 

Theirapproach to evaluating a faeulty member s scholarship is similar. 
The department head and dean independently rate scholarship using an- 
other ordinal seale: 

• Rate a 0 on this criterion if the faculty member has, during the past 
year, (I) had no publications and 12) had no systematic program of re- 
search and study, 

• Rate a I on this criterion if the faculty member has, during the past 
year, (I) published a book review or its equivalent or (2) pursued a sys- 
tematic program of research atid study leading toward further publication 

• or the presentation of a new course, 

• Rate a 2 on this criterion if the faculty member has, during die past 
year, displayed activity in scholarship hy having (I) puhlished an article 
or equivalent series of hook reviews or subsidized studies and (2) pursued 
a systematic program of research and study leading toward further pub* 
lication or die presentation of a )iew course. 

• Rate a 3 on this criterion if the faculty memher has, during the past 
year, displayed good scholarship hy having (J) published one OKitwo high' 
quality articles or edited an anthology or book of readings and (2) pursued 
a systematic program of research and study leading toward publication 
or the presentation of a Ui^w course. 

• Rate a 4 on tliis criterion if die faculty memher has, during die past 
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year, displayed excellence in scholarship hy having (!) pursued a system- 
atic research and study program leading toward f urther publication or the 
presentation of a new course and (2) published a hook or equivalent of 
articles andlor monographs, or the equivalent in the fine arts; or (instead 
of 2) (3) devised a set of pivcedures or syllabus that could be expected to 
affect the teaching of the discjplhie in first-rate colleges and universities, 
• Rate a 5 on this criterion if the faculty member has, durifig the past 
year, displayed outstanding, excellence in scholarship by having(l) pursued 
a systematic research and study program leading toward further publi- 
cation or the presentation of a new course and (2) authored a high^quality 
book,or an equivalent set of art ides and/or monographs, or the equivident 
in the fine arts; or (instead of 2) (3) devised a set of procedures orsylUdnis 
that can be expected to substantially change the teaching of the discipluie 
in first-rate colleges and universities, (Michalak and Friedrich I98I, pp. 
584^85) 

Michalak and Friedrich admit the subjectivity of the measures. How- 
ever, they contend that the use of an original rating scale plus multiple 
data sources enhance reliability and validity. Clearly, their approach at- 
tempts to promote objectivity by both qualitative and quantitative means. 
In other words^ qualitative objectivity is enhanced bv a technique (the 
ordinal rating scale) to improve evaluations conducted by individual de- 
partment heads and deans; quantitative objectivity is enhanced by using 
multiple sources of data rather than rallying on observations of a single 
party. Although the procedures used at Franklin and Mai-shali reflect the., 
administrative routines of that institution and may be difficult to replicate' 
elsewhere, we ivcommend the technique of providing administrators with 
descriptive scales and using multiple raters. : 

Having considered the purposes of evaluation (faculty improvement* 
and PRT decision making), the areas to be evaluated (teaching, research, 
and service) and criteria (explicit and written) to assess these areas of 
performance, the fourth major issue to be addressed concerns procedure, 
i.e., the sequence of activities for implementing a faculty evaluation pro- 
gram. 

Administrative Procedures 

A critical issue in faculty evaluation is determining how data arecollccted 
and revicAvcd. One approach is for individual faculty to bear the "burden 
of proof." In other words, at some institutions, faculty are expected to 
provide evidence of their teaching, research, and service effectiveness. 
Prior to the 1970s, this was the predominant approach to data collection. 
Its major disadvantage is that data collection is not standardized and 
often what is evaluated is not faculty performance in the areas of teaching, 
research, and service, but rather faculty skill in collecting and presenting 
data in a favorable light. Since the 1970s, there has been a trend toward 
systematic, standard data collection. Increasingly, institutions are spelling 
out what data faculty should collect regarding their own performance. 
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often in the form of teachers* dossiers. Moreover* institutions are spelling 
out what data the institution will routineK colloet regarding faeulty per- 
formunccThe major advantage is that faeulty niembers know in advanee 
what their responsibilities are (Centra 1979), 

The shift in responsiblity from those who are going to be evaluated to 
those who are going to be doing the evaluating is rollected in Lvble's ree- 
om men da t ions to department heads with respect to PRT decisions: 

/. Keep catefiil records of what each meiulw oj a department does as a 
teacher, quarter by quarter, year by year, 

2. At tenure time or time of other important reviews, reduce these data to 
an easily grasped form and place them in the hands ofevctyofie iftvolvcd 
in the review, 

3. Most of all say a^ivat deal about teaching before tenure time. If you 
. wait to speak, it will always be too late (1978, p, 30), 

On almost every aspect of promotion, retention, and tenure, institu- 
tional policies and practices vary. Although some colleges and universities 
may still use informal procedures to make PRT decisions, the trend toward 
more systematic evaluation has included a formalization ol faculty per- 
sonnel policies and procedures. 

In the academic profession, as in other prolossions. members of the 
profession make personnel decisions. In most colleges and universities, it 
is senior faculty who h^p make PRT decisions; although, in some cases 
even junior faculty are represented in the process. While practices vary, 
often faculty committees make recommendations to administrative offi- 
cers, e.g., department heads, deans, academic vice presidents, and presi-^^ 
dents. (Commission on Academic Tenure 1973). ^-^.^ 

Once the data are collected, the review process, for the most part, is 
dominated by administrators. Given the importance of evaluation deci- 
sion,s to individual faculty. Moomaw was surpri.sed that faculty rarely 
play a substantial mle in the functioning of facuhy evaluation programs. 
The Southern Regional Education Board survey of assignment of principal 
evaluation responsibility for decisions on salary? promotion, and tenuiv 
demonstrated conclusively that the academic dean and department chair- 
person are the two most important persons. The department chairperson 
was more important in doctoral and master's legree institutions and the 
academic dean was more important at bachelors degree and two-year 
institutions (Moomaw et al. 1977), 

There arc two views regarding the dominant role played by adminis- 
trators in evaluating faculty. According to one view, evaluating faculty 
for making PRT decisions is an appropriate responsibility for adminis- 
trators. Defending the view that PRT decision making is an administrative 
responsibility, the dean of one New England college of education com- 
mented. "There Is no set of data or procedure for gathering it and no 
review process that can .subhitute for our [deans'l pmlessional judgment*' 
(Philippi 1979. p. 9). 
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Philippi acknowledges the difficulty of evaluating faculty, but does not 
back away from the responsibility:^ 



/// a wtiversity the academic administrator is a double a^em, :>er\uu^ as 
an agent of the very faculty he is supposed to evahuite as the a^ent of the 
imiversity . If an academic administrator is incapable of consistent, 
sound judgment, tennination of that administrator would alone improve 
personnel practices in the institution 11979, p. 10). 

The view that administrators are responsible for evaluating faculty 
does not exclude faculty from participating in developing and evaluating 
the process. In fact, faculty involvement in developing the evaluation pro- 
gram and critiquing it is explicity recommended by Mark (1977). 

According to a second view of the role played by administrators, ad- 
ministrators should have less control. Notably, academic unions generally 
try to reduce or eliminate the power of administrators to reward faculty. 
For example, unions have sought to have new appointments defined as 
''probationary," which implies a claim topermanency for faculty who can 
demonstrate that they can handle the job (Ladd and Lipset 1973. p. 72). 
In general, where collective bargaining exists, due process foi; faculty being 
evaluated is spelled out, and faculty committees are pla\ iUgi an increased 
role in recommending personnel decisions and listening to aj^peals. Never- 
theless, administrators continue to play dominant roles in the process. 

The need for duo process is highlighted by the decisions V)f counts. In 
a review of court cases and their implications (which should be roaci by 
all administrators and faculty concerned with faculty evalualion), Balch 
documents that many courts have hesitated to take a role in the decision- 
making process of faculty evaluation. For the most part, couns defer to 
the expertise of administratprs and faculty to evaluate facultu However, 
she notes a rise in the number of cases brought to court and the willingness 
of courts to become more involved than they used to be. Balchlattributes 
the rise in the number of cases brought to court to the financial retrench- 
ment in higher education. As relocation becomes more difficult mr faculty, 
the willingness to fight negative personnel decisions through legal chan- 
nels becomes more attractive (1980). ] 

The willingness of courts to become more involved can be attributed 
to the notion of *'state action" in private institutions. In other words, some 
courts view private colleges and universities that receive large! amounts 
of federal and state funding as public institutions. ConsequentlyJthe Four- 
teenth Amendment, which provides that the state shall not deprive any 
person of life, liberty, or property without due process of law, may apply 
to private as well as public institutions. Thus, at a minimum, colleges and 
universities should comply with due process in making PRT decisions, 
i.e., provide faculty with proper notfce and an opportunity for a fair hear- 
ing. Also, although courts show little or no interest in the specif e criteria 
included or evaluation methods used, they do expect criteria anc methods 
to be published (Centra 1979). 
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As a result of her review of legal actions, Balch recommended that 
administrators shoiiid: 



/. become knowledgeable conceniing the ever-clunigiug le^al obliga- 
tions and rights of public/private institutions toward faadty evahia- 
tion, 

2. be certain of hislher role as an administrator in carrying out the 
evaluation policies and practices of the institution, 

i. make certain that the evaluation proce^Ks of each faculty member is 
fob-related. 

4. be sure that the faculty evahiations are not discriminatory in intent, 
application, or results. 

5. be certa in th a t evah lation forms con ta in precise andun ifom i la i iguage 
and shouh! be statistically valid. 

6. guarantee that eyaluators at the institution are trained in how to use 
and analyze ex^ahtatioti instruments, 

7. provide for the perfon nance evaluation process to inchide the appro*- 
prlate variety of represented groups' students, faculty peers within and 
without the department, and administrators. 

J 8. insist that perfopnanee evahtation procedures be conducted in entirety 
before making any changes in personnel decisions, 

9. ii ifonn faci dty mei nhers h i writ it ig of the rest dts of their perfon i la nee 
evahtation, 

1 0. see th a t their i t istitu tion develops thoi oi igh \ vrittet i policies pert a it i h ig 
to the I tse of fact dty eva h tat ion, p rocedi t res foradm inistration ,and va rioi ts 
rttles which may govern any decisions rendered. 

It. inake certain that these policies are communicated to all newly hired 
facnlty members before they sign jtheir first contract so ihat both parties 
fully understand the entire evahtation process. 

12. provide for consistency of standards and procedures offacidty eval- 
uation. 

13, work for improved administrative-faculty communications to keep 
evaluation procedures '*ubove board,*' 

Nxreate a sense of fairness in facing evahtation problems, 

i5.. not take rash action to tnere **hear;say'* of other facithy tnenihers, 

1 6. check on current in ,sura i i ce policies for maximum coverage permitted 
by law (for pos,sible cases of administrative liability in evahtation). 

1 7. employ legal counsel who has a good knowledge of the institution, its 
organizational structure, policies, and goals. 

18. have this legal coitnsel keep the administrative staff and faculty, as 
well as students, current on their rights in the entire evahtation picture 
a 980, pp. 38-39). 

In addition, Balch recommends that facuhy should: 

/. be aware of and get to know* the next highest person in administrative 
authority. When a problem arises, contact this person first. 
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2. (ry to have a third neutral party present when dealing with a'^liot" 
issue with either students or adtninistration, 

i. keep and maintain written memoranda of conferences andlor tele- 
'phone conversations, 

4, he aware that nothing can be assumed to be confidential if told to a 
student or a colleague, 

5/ keep up to date written personal records of all academic accomplish- 
ments, senice, and honors, ^' 

6, utilizejhe student evaluation process of the institution and for added 
strength » design self-evalttation forms for classes to respond to other types 
of questions, 

7, keep all records from the time of hiring (contracts, factdty handbook, 
catalogs, etc) in a chronological file. 

8, not attack verbally and publicly the department chairperson, dean, or 
U^resideiit of the institution. 

\i\ 9, keep current about new laws, ndes, regulations, or policies which 
^ \night affect the teaching position. 

lO, try to remain out of the '^losing' categories such as "immoral, be- 

itaviorally undesirable, or incompetent'* (1980, pp. 



hero are pressures on colleges and universities to develop adminis* 
tratlve procedures acceptable to faculty and their union representatives 
and to implement a faculty evaluation program consistent with the re- 
quirements of due process. These pressures point to as much faculty in- 
volvement as possible in designing!, implementing, and critiquing the 
evaluation process. In rt*sponse to these pressures, the Southern Regional 
Education Board initiated its faculty evaluation project in 1977 to help 
member institutions design, revise, or critique their faculty evaluation 
programs. This regional approach is worth noting because the major rea- 
' sons for its success may be applicable elsewhere. 

> 

' SREB faculty evaluation project* SR£6*s faculty evaluation projc*ct was 
an 18-month project to help 30 participating institutions promote the 
principles of comprehensive, systematic faculty.evaluation. A stimulus for 
the project had been SREB's 1975 survey and the 1975-76 case studiesr" 
which showed evidence that, in general, faculty evaluation was not com- 
prehensive and systematic. In the fall of 1977, the project staff conducted 
two rcgjonal conferences to discuss the SREB findings and to encourage 
member institutions to apply to be among the 30 colleges and universities 
to develop ne^c or revised faculty evaluation programs with the assistance 
of SREB a»sources. Fifty-six institutions applied for the 30 positions. Se- 
lections were based on diversity of type of institution and reflected various 
levels of sophistication and types of practice. The aim of the project was 
to help improve faculty evaluation on a regional level; 

Central to the project^s rationale was the belief that institutions could 
benefit from collectively addressing the same issues and using similar 



O Facultv Evaluation ■ 2*9 



cha i ige strategies 1 1 inter a reg iomd fimhrella \ vh ich inch uled periodic j^roap 
experiences and access to similar regional resources wfule workiu'g on 
appropriate local approaches lO'Connell and Smart t 1979, p. 2). 

To implement a regional approacli. ilie SREB formed a task force on 
faculty evaluation, which reviewed staff findings, produced recommen- 
dations for developing a new or revised evaluation program, jand sei-ved 
as an advisory committee to monitor progress throughout the project. At 
each campus, an institutional team of at least two faculty membei-s and 
one academic administrator was formed. Institutional team members at- 
tended three workshops at six-month intervals. After each workshop, one 
of the workshop leaders visited each campus to consult with the institu- 
tional team there. In addition, project staff kept in contact with institu- 
tional team members. 

The SREB faculty evaluation project was evaluated by a three-member 
team: Jon F. Wergin of Virginia Commonwealth University, Al Smith of 
the University of Florida, and George E. RoUe of the Southern Association 
of Colleges and Schools. On a rotating basis, two members of the evalu- 
ation team observed each of the three semi-annual workships and used 
an evaluation form to assess the^ effectiveness of these workshops. Also, 
after each workshop, one member of each institutional team was inter- 
viewed. In addition, following each visit by a consultant to one of the 30 
campuses, the consultant and the institutional team members coiJ5pleted 
an evaluation form. Finally, each evaluation team member visited five 
institutions and reviewed the portfolio of five institutions. 

According to the evaluation team, the 30 participating institutions 
could be organized into three categories: 

1, Fiflcvn institutions had set a goal^of developing a new comprehen- 
sive faculty evaluation system from scratch. The evaluation team found 
that five accomplished their goals in full, i.e., a new^ system had been 
developed, field- tested, approved, and readied for full implementation. 
Four had developed a new ^system that was curivntly being field-tested; 
four had developed parts of a new system such as a student evaluation 
form; and two had not progressed much beyond prelinunary data collec- 
tion such as faculty surveys and interviews. 

2, Nine institutions had set a goal of modifying or 'Tine tuning" their 
current system, e.g., revising th^ student rating form or tying faculty 
evaluation close to faculty development. The evaluation team found that 
eight of these institutions had made significant progress. In the one school 
that had not made significant progress, poor communication and a low 
level of trust between the faculty and the administration seemed to be the 
deterrent^to implementing revisions in their faculty evaluation program, 

3, Six institutions had set a goal of reviewing and assessing the status 
quo and improving communication about faculty evaluation. These tended 
to be large institutions with existing formal systems of faculty evaluation. 
The evaluation team found that the project had little observable impact 
at only one of those schools (Wergin, Smith, and RoUe 1979, pp. 7-11). 
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The evaluation team concluded: 



hi stow)uin\ Own, with a few exceptions, the institutional teams [hadj 
made significant progress toward accomplishing their original gocd'i. This 
progress has perhaps been most impressive in those colleges in the first 
group ^vho started from ^'ground zero' , , , Further, theix [were] major 
successes in both of the other twv groups as welL Oyerall, across the 30 
project institutions, ohsemiblc progress toward goal accomplishmefit fwxisj 
visible and obsenuihle in all but four (Wergin, Smith, and Rolle 1979, pp. 
8-9). 

In addition to the importance of thv SREB faculty evaluation project 
as an embleni of the growing interest in faculty evaluation during the 
1970s, the project is important because the major reasons resf^)onsible for 
progress in the participating institutions are applicable to colleges and 
universities elsewhere. According to the evaluation team, seven charac- 
/ teristics in descending order of importance were: i 

K active support and involvement of top-level administrators; 

2. faculty involvement throughout the project; ' 

3. faculty trust in administration; 

4. faculty dissatisfaction wfth the status quo; 

5. historical acceptance of faculty evaluation; 

6. presence of an institutional statement covering the philosophy and 
uses of evaluation; and 

7. degree of centralized institutional decision making (Wergin, Smith, 
and Rolle 1979). 

For college faculty and administrators who wish to improve faculty 
evaluation at their campuses, these characteristics provide a template for 
assessing conduciveness to change. Also, in the absence of these charac- 
teristics, agents of change arc provided with organizational goals \o aim 
for when preparing for change in faculty evaluation. 
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Summaiy and .Conclusions 



A trend that began in the 1970s and coniinued into the 1980s has been to 
examine how faeulty are evaUiated, Manifestations of this trend have been 
meta evaluations condueted by systems of higher edueation, more systcnv 
atie programs of faeulty evaluation developed by individual institutions, 
and researeh studies earried out by evaluation specialists. 

Purposes of Evaluation ' 

A critieal review of this trend indieates that, eurrently, faeulty evaluation 
docs not serve well the dual purposes of making personnel (promotion- 
retention-tenure) decisions and helping faeulty improve. An examination 
of faeulty evaluation systems indieates that often making personnel de- 
cisions is more readily served than helping faculty improve. For many 
faculty, evaluation of their performance is threatening and the ends of 
evaluation are perceived as punitive. This view is reinforced in institutions 
where there is a low level of trust between the administration and the 
faculty. Also, negative attitudes of faculty tovyard evaluation can be ex- 
pected where faculty have not played important i*olcs in the initiation or 
development of the faculty evaluation prograni. 

In some institutions, administrators naively believe that faeulty de- 
velopment flows naturally from faculty evaluation. It is assumed that, if 
faculty ixxv provided wiKh evaluation data, they will seek to improve tl\pir 
performance. However, there is little evidence that thi.s- is automatically 
ijriie. Studies indicate tjiat evaluation can lead to development under cer 
^lam conditions, e.g.. if educational consultation accompanies evaluation. 
Ironically, in some institutions where educational resources are available, 
faculty developers intentionally disassociate themselves from the faculty 
evaluation program because of its negative image. 

An advantage of linking faculty development to evaluation is the of- 
ficiency of one system of data collection, Unfortunately, at the prvsent 
time, many administrators and faculty see that the purpose of faculty 
evaluation is to make personnel decisions and pay lip service to tiie pur- 
pose of faeulty improvement. This is unlikely to change in the near future. 
However* it will change in colleges and universities that use development 
and improvement as a criterion in their evaluation of faculty. In other 
words, faeulty evaluation will serve the dual purposes of making personnel 
decisions and developing faculty where it is rewardnig in the PRT process 
for faculty to demonstrate evidence of development and improvement. 

Areas for Evaluation 

The major areas to be evaluated are teaching, research, and service. The 
trend in faculty evaluation is to debate the weight of teaching versus 
research . In most colleges and universities service is considered in a distijnf 
third place, although this may not be thecase in institutions thatlw'x'e a 
historic tradition of public service. ^ 

Although son)e research-oriented universities stress tli^ifnportance of 
research over teaching, most colleges and universities'purport to stress 
teaching. However, many faculty believe tliat the^emphasis on teaching 
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is lip service and that research is given more weight in evaluation. In sonic 
cases, this perception is correct. In these institutions, often administrators 
lament the dilficulty of evaluating teaching and would like to give it more 
weight "if only" it could l?e measured adequately. 

Faculty will be less threatened by evaluation when (1) the weight given 
to teaching, reseaivh. and scrvfcc is niadc e.xpjicit and (2) discrepancies 
between purpoi ted and actual weights are diminished. Evaluating what 
is believed to be the important areas of performance rather than easy-to- 
moasuiv areas will go a long way toward increasing confidence in the 
evaluation process. This will require advances in the state of the art of 
developing criteria and standards. 

» 

Criteria and Standards 

With the general trend toward more systematic and comprehensive faculty 
evaluation, efforts havt been nvado to improve the o/;/tx'/iV//yof evaluation. 
The qt'uUuitive approach to objectivity emphasizes improving the quality 
of duja collected from any single source, namely by providing written 
explicit criteria and standards of performance. The (fiuuuiuuive approach 
to objectivity emphasizes collecting data from multiple sources, e.g.. stu- 
dents. peers,.self. and'administrators. * 

Because of the impetus to give more weight to teaching, criteria dc* 
vclopment has focused mostly on clarifying just what are the attributes 
^ of effective t6aching for those doing the evaluating as well as for those 
being evaluated. Thus far. the trend is to design forms that ask students 
and peers to evaluate an instructor on elements considered demonstrative 
of effective teaching. 

One unresolved ^ssue is the use of student learning as evidence of 
effective teaching. To ivsnlve this issue, much more will have to known 
about the relationship between teaching and learning. Although more is 
being learned from the experimental work of cognitive psychologists, the 
efforts of colleges and universities to include student learning as one of 
many data sources also will incivase our understanding of student learning 
c as a c r i I e rion of teach i ng eff ec t i ve ne.ss . 

Another unresolved issue concerns.peer review ofchLssroom teaching 
versus review of teacher dossiers. Some faculty consider classroom vis^ 
itation an infringement of rights, a negative action. Others believe that 
teacher dossiers are too removed from the act of teaching. Unfortunately, 
relatively little has been done to study the area of colleague evaluation. 
The Slate of the art is primitive regarding how to use peer review in 
technically adequate, useful, efneieni. and ethical ways. 

In the view of the authors, the development of written, explicit criteria 
and standards has produced a tremendous conflict in values, which we 
have chosen to call a *'crisis in spirit/' On one hand, there is the value of 
fair play. By making criteria and standards used to evaluate faculty 
p/ici7, faculty know what is expected of them. Being explicit protects fac- 
ulty against race and sex discrimination as well as against other arbitrary 
judgments. 
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>^ On the other hand, there is the vahie of ['acuity motivated by intrinsic 
rather than extrini^ic reasons. Exphcit criteria encourage faculty to^do 
things for the sake of evaluation, A potential abuse is that faculty will 
meet criteria, but not with quality or the desired spirit of action. For 
example, siippo.se that one criterion of effective teaching is "the instmctor 
provides students with an up-to-date bibliography," A teacher who is 
intrinsically motivated to conduct c ourses may naturally be familiar with 
new contributions to the literature and will update the bibliography as a 
matter of course. In this case, we can imagine a teacher who critically 
reads the literature-and thoughtfully adds to and subtracts froni the bib- 
liography with student needs in mind. Before the development of explicit 
criterfa, his or her maintenance of an up-to-date bibliography might even 
have been cited..posi facto as evidence ^of good teaching at the time of 
tenure review, 

iWith the advent of explicit criteria, one could now imagine a faculty 
member adding new citations to the bibliography without having read 
the new materials. In addition, older citations could h*: dropped without 
weighing which of the old sources deserve 40 be maintained-on the bib- 
liography. In fact, one could even imagine an e,\trinsically motivated teacher 
preparing^ an annotated bibliography based on information pro\ided on 
book jacket covers and journal article abstracts! Meeting criteria for tlie^ 
sake of evaluation coiild produce what we have chosen to call a^^ixish^ 
spirit," 

The Commission on Academic Tenure in Higher Education (jointly 
sponsored by the American Association of University Professors and the 
Association of American Colleges) anticipated the same problem in its 
final ^report: 

Evaluation too often stresst\s quantity rather than quaUty. Review com- 
niittees are ifnpressed hy the number of puhhcations rather than by their 
significance. Extrinsic sif^ns stich as the general reputation of journals or 
pul)liihers are often substituted for a positive assessment of the work itself 
Nonteniired members of facilities, believing that largely quantitative tests 
of publication prevail, lose confidence in the evaluation process and are 
Qften' prompted to undertake quick projects that will expand their bibli- 
ographies, rather than to work on more difficult or more long-term proh- 

lams (1973, p. 391 

r > " 

Precedent for this crisis can be found" in the movement towaid instruc- 
tional objectives. In the 1960s, when instructional objectives became pop- 
ular, pLroponents argued thiYt instructional objectives promoted fairness 
becau.se students would know what to study. The argument was similar 
to the oni.* used on behalf of explicit criteria for faculty evaluation: If 
students and teachers are informed madvance of xyhat Is the basis of their 
evaluation, evei;vone will have equal opportunity to succeed. 

However, the experiences of some faculty with instructional objectives 
are that it is difficult to*write objectives for high levels of learning, e.g„ 
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unaly/ing versus knowing, low-lcxcl Icai ning tends to triviali/e learning, 
e.g.. "the siuilent will Ik» able to cleline , , . reeogni/e . . , identily;" and 
students who .stud\ onlv to meet objettixes do not go beyond the objectives. 
Similarly, it is possible that lacuhy evaluation will be based on criteria 
that are the easiest to measure and faculty will meet standards in a pro- 
cedural rather than substantive fashion. The challenge in faculty evalu- 
ation is to promote fairness b\ using explicit criteria and pmmote quality 
with high standards. 

Administrative Procedurcij 

A review of laciilty evaluation reveals that, in many colleges and uni- 
vci'sitiesr) faculty involvement in initiating, developmg, and implementing 
evaluation systems is low. The willingness of administratoi-s to take re- 
sponsibility for routine data collection (e.g., course ratings by students) 
reinforces the notion that evaluation is something done to faculty rather 
than by Faculty. More faculty involvement in designing and evaluating 
systems orfacult\ evaluation tan be expected in colleges and universities 
that use colleague ratings and self-ratings. Preparation of teaching dossiers 
further involves faculty in the. evaluation process. Involvement of faculty 
is desirable because faculty judgments are needed to produce meaningful 
criteria and standards. 

Although the economic factors that stimulated an examination of fac- 
ulty evaluation may change, the authors predict no return to the "good 
old days" when one was promoted because tlie department head and dean 
"liked the cut of his jib,*' The requirement of courts that institutions of 
higher learning provide written explicit criteria and due pmcess, the ex- 
pectation of faculty and faculty unions for shared governance, and the 
support by administratorsand faculty forfair personnel decisions all point 
to a continuation ol the trend toward examming how faculty are evaluated 
and developing nioiv systematic, comprehensive systems in the 1980s. 



> 
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Appendix A 



Suggested Form for Peer Review of Undergraduate Teaching Based on 
Dossier Materials 

Suggested Focus in 
Dossier Materials Examining Dossier Materials 



/. What is the (luality of ina tenuis used ifi toachiu^? 



• Course outline 

• Syllabus 

• Reading list 

• Test used 

• Study guide 

• Deseription ofnon^print niuteriiils 

• Hand-outs 

• Problem sets • 

• Assignments 

Peer Reviewer's Rating: Low | _ 

Comments 



• Are tliese niateriuls current? 

• Do tliey ivpresent the best work in 
the field? 

• Are they adequate and appropriate 
to cour se goals? 

• Do tliey represent superficial or 
tliorough coverage of course con- 
tent? 

t_l_t_|_VenHigh 



2, What kiml of mtellcvtital tasks iwrt* set by the teacher lor the studetits (or did the 
teacher suceecd in getting* students to set for themselwsj, ami kow did the stude}its 
perfonn? 



Copies of graded examinations 
Examples of graded research papers 
Examples of teadier's feedback to 
students on written work 
Grade distribution 
Descriptions of studetu performances, 
e.g., class presentation, etc* 
Examples of completed assignments 



Peer Reviewer s Rating: Low | | . 

Comments 



• What was the level of intellectual 
peifoimance achieved by the stu- 
dents? 

• What kind of work was given an A? 
a B? a C? 

• Did the students learn what the de- 
partment curriculum expected for 
this course? 

• How adequately do the tests or as- 
signments represent the kinds of 
student performance specified in the 
course objectives? 

.|_|_|_|_VeryHigh 
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J. How kno\vlei!i;cahlc is this fiwnliy member m suhiccis iiiitf*/u? 



• Bvidcncc in teaching materials 

• Recutd uf attendance at regiunal or 
national niectings* 

• Record uf col luqui a ur lectures gixeti 



Peer Reviewer's Ra I ing Low t - 

Comments 



Has the instructor kept in thought* 
ful contact with developments in his 
or tier field? 

Is there evidence of acquaintance 
with die ideas and findings o[ other 
scholars? 

(This question addresses the sclioh 
arship neccbsary to good teaching. 
It is not concerned with scliolarly 
research puhlication.) 
— I 1 t Very High 



4. Hits this faciiliy member assumed responsibiliiias relaied lo the (Icparimcni's or 
universiiy's leacliWf* mission? 



Record of service on depart me tit 
curriculum committee, honors pro- 
gram, advising board o^ teacliing 
support service, special (jonmiit- 
tees (e.g„ to examine grading poli- 
cies > admission standards, etc.) 
Description of activities in super- 
vising graduate students learning 
to teach. 

Evidence of design of new courses. 



Has he or she become a departmen- 
tal or college citi/en in regard to 
teacliiiig responsibilities? 
Duos this faculty member ivcognizo 
problems that hinder good teaching 
and does he or she take a responsible 
part in trying to solve them? 
Is the involvement of the faculty 
n»ember appropriate to his or her 
academic level? (e,g.. assistant pro' 
fessors may sometimes become ov- 
er involved to t|ie detriment of their 
scholarly and teaching activities.) 



Peer Revie\ver\s Rating: Low | . 

Comments 



. Very High 
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5. To wliiit extent is tliLs laculty member ttyvi^ to achieve CsXtelknce m tenchinii? 



Factual statement uf what activities 
the faculty member has engaged m 
to improve liis or her teaching, 
* Examples ofqucstiunnauesubcd for 
formative purposes. 
Examples of changes made un I he 
basis of feedback. 



Peer Reviewer's Rating: Low 1 . 

Cunimcnts 



Has he or .she suughl feedback about 
leaching qualily. explored alteina- 
live leachiag methods, made changes 
to inetcase student learning? 
Has he or she sought aid in trying 
new teaching ideas? 
Has he or she developed special 
teaching materials or participated 
in cooperative elforts anued at up- 
grading teaching quality? 

-I — t — 1 Very High 



Peer Reviewer's Signature . 

Date- 



ERIC 
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Appendix B 

Guidelines for Use of Results of the Student Instructional Report 
The following guidelines were developed by Educational Testing Service 
stair and college and university representatives to assist institutions \\\ 
the appropriate use of student raiings of faculty. Although the guidelines 
are based primarily on the use of the Student hvstructional Report (SIR), 
they have a value beyond their association widi the use of this particular 
instrument. 



The Student Instructional Report (SIR) typicalK and appropriately isused 
for instructional improvement; for tenure, promotion, or salary decisions; 
and by students for course selection. These guidelines pro\ ide information 
to teachers, administrators, and students who iise SIR in any of diose 
ways/' Each guideline, unless otherwise indicated, is appropriate for all 
three uses. 

It is important that faculty members and administrators understand 
clearly how the results of student evaluations will be used, who will have, 
-access- tonuu-'rcsults. and-how tlieii^use relates to local contractual^r- 
rangementsor institutional policies. 

These guideline recommendations were ba.sed on a series of studies 
with the Student Instructional Report and other research with similar 
instruments. A committee of SIR users. ETS stalf. and researchers n.et to 
review and discuss the guidelines. The final list represents.tlie experience 
and knowledge of this group. 

1. Use multiple sources of Information. For whatever purpose the results 
may be used, it is critical to keep inniind that student instructional ratings 
represent only* one source of information about teaching performance. 
Other information rtbout teaching, in addition to student opinion, al.so 
should be included. In particuhu\ SIR should lioi be used as the solo basis 
for evaluating teaching effectiveness. 

2. Use multiple sets of ratings. A pattern of ratings over time is the best 
estimate of instructor effectiveness as seen by students. Rating^s from only 
one course or from one term may not fairly rt^present a teacher's perfor- 
mance (although, for course improvement, ratings from a single course 
can be useful.) For personnel decisions, it is essential to examine rating 
trends or patterns over time (see additional comments in number 4 re- 
garding possible course bias): 

3. Obtain a sufficient number of student raters. The reliability of the SIR 
items depends on having a sufficient number of students responding in 
order to reduce the effects of a few divergent raters. 



(OSeptember 1981. College and University Programs. Educational Testing Service. 
* Although there may be other uses of SIR results, dicse guitlelines address the three 
ttiost frequent oiies. ' 




Facuttv nyatmtion ■ 39 



Ciirromly, rcporis are not primed lor a class wiih Icwer than live 
stiidciits. Reports based oi\ responses from fewer than 10 students are 
flagged with an asterisk and uscns are advised to interpret theni with 
caution. When fewer than 10 students respond toan> individual item, the 



The proportion ol a class that rates an instructor also is inipoi tant. If 
over a third aiv absent oi choose not to respond, the results ma\ not bo 
rcpreseniauvc of the class, (The reliabilities ol all SIR item means are 
listed and discussed in SiR Report Nwiiha SJ 

4. Take into account course characteristics. A lew course characteristics 
appear to affect ratings and should be taken into account by reference to 
appropriate comparative data or in other ways, SmaH classes (that is. 
under 15) often receive more favorable ratings than larger classes, perhaps 

' deservedly. since they often pmvide a better learning environment. Courses 
requii-ed by the college that arc not part of a student's major or minor 
field tend to receive soniewluu lower ratings than other coiirscs. Ratings 
also may diffei because of the subject field of the coinse. For each ol these 
characteristics, the differences may not be large, but together they can be 
significant. 

5. Rely more on global ratings than onother items for personnel decisions. 
Overall ratings of the teacher or the course (item.s 39 and 38) lend to 
correlate higher with student learning .scoa*s in a course than do other 
items or factors in SIR. Decision makers, thei-ofore. should focus initially 
on the overall evaluation items. Other items and factors in SIR, which 
are useful for diagnosing teacher or course strengths and weaknesses, are 
important for improvement purposes and for interpreting the overall rat- 
ings in personnel decisions. These items tend to reflect different teaching 
styles and therefore should not be summed or averaged to pro\ide a total 
score. (SIR Report Number 4 pivsents data on the relationship benveen 
student ratings and learning scores.) 

6. Supplement diagnostic information for teaching improvement. SIR 
i-esults help to diagnose teachers' strengths and weaknesses. Although 
studies have shown that some teachers can improve after receiving SIR 
results, others may not know how to change their instruction. Instructional 
development services and resources can help teachers who want to do 
sonic thing about these weaknesses. It is appropriate to use SIR ivsults in 
instructional coun.seling and to direct teachers to resouives for instruc- 
tional improvement. 



'On iheSIR lepori n sell, n em moans are iioi cunipuiedw hen movcihan 50 percent 
of the sludonts eiilier onui an nem or mark n not appliLable. and lac i or seoies aiv 
not cumpuitAf when dicre is a high (.SO peitent) oixiit or not applicable vaie in one 
or more of ihc^iienis in ihe facior. 



same caution applies. 





7. Use comparative data. Smco stiulciit ratings ivpicall\ loiul to be fa- 
vorabic, comparali\c cLita (1)()ih national aiul appiopriaic local data) pro- 
vkica context within which teachers and others can interpret individual 
reports. In making comparisons, it is iinporianl to look at the di.stribution 
of students' respon.scs in each class as well as at means and deciles, and 
not looverinterprel small dillcrcnccs, DilfereiKes of less than 10 percentile 
points on any item or laclor generally arc not ci ilical. and SIR data are 
piosented onlv at 10 percentile intervals. In most cases differences of at 
least 20 percentile points aic iiLL-dcd to be .significant relative to the na» 
tional comparative data. 

Users ol the Student Instructional Report are leniinded that the na- 
tional data are comparative rather than norniativc., and the tendency 
toward high ratings nia\ work to the disadvantage jf s'ome instructors, 
histitutions may wish to supplement the national data with local nor- 
niativo^dM.ta that arc develo.i)ed o\or time, (The SIR Compumtiw* Guide 
incjtwes a h\\\ d1scassioiboLth5^c;c)mpositi()n of the national daia.) 
y'^'^ ^'^ -^-^ 

8. Employ standardi/.ed procedures; for administering the forms in each 
class. When the results will be used in personnel deci.sions, it is critical 
to employ standardized adn)iiustrati\e procedures, IZach institution will 
want to develop its own metliod. One possibilit\ is to have a student, 
another (acuity member, or someone other than the teacher involved dis- 
tribute, collect , and place the questionnaires in a sealed envelope. (Mailing 
the forms to students usuall\ results in a poor response rate.) The teacher 
should not be present during the process, which probably will take less 
than 15 minutes of class time. The timing, preferably during the last week 
or twool class, also should be standard; it probably is best to give results 
to instructors alter grades for the course ha\'e been reported. 

Additional Suggestions 

1. For additional diagnostic information, use the optional Items and writ- 
ten comments. Use of optional items can make the SIR. adaptable to a 
wider range of courses. Up to 10 additional and locally written items can 
be added to SIR in Section IV (items 40-49). These might be course spe- 
eilic, provided by the individual teacher or department, ov they might be 
drawn from the "Suggested Supplementary Items" li.sV included in 
sirucior\s.Ciitule to Ushifi SIR. Information from these items ancf from the 
eonimenls written by students in response to the last part of SIR (for 
example. How can the course, or the way it was taught, be improved?) 
provide additional helpful information to teachers. Faculty members and 
others who receive this inform:^iion should keep in mind, however, that 
it may not be po.ssihie or desirable to satisfy all students' complaints or 
wishes. 

2. Teachers should be encouraged to supplemcnt^thcir Instructional rat- 
ings^This is especially important in personnel dccisions'or in\siudein use 
of SIR results for course selection.^ Teachers should be encouraged and 
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given the opportunity to describe w hat they woie tr\ to accomplish in 
the course and how their methods lit those objectivyCor to| discuss cir- 
cumstances they feel may hase affected tji e eNuUiaXoiis. What giav seem 
like poor rat ings^n a particular aspect oi a cou i scy n a y be d u e , lor exa mpic, 
to the teacher's attempt al a new or different approach to the c*()»r5»e. 

3. Carry oul local studies, if possible. It/^lso luay be desirable \for an 
institution to supplement SIR research findings with local studies. 

4. Do not overuse the forms. If ratiivgs are used in ever\ coui'se every t>*vnv, 
students can get bored and luay i-^^ond haphazardly oi not at all. i^acult*. 
members may resent the lost cmss time and also may pa\' Ibss attention 
to the results. For these vcw^ns an institution ma\ wish to monitor die 
frequency of use of studciu^'evaluation. Strike a balance between the need 
for external evaluation 4i)d the need to experiiuent freely in nistruetion. 
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1980 Research Reports— AAHE members, $3 each; nomtiet}il)ers,jS4 each; 
plus 15% postage! handlUt^, 

* 1. Federal Influence on Higher Education Curricula 
^ William V, Mayville 

2. Program Evaluation 
Charles E. Feaslev 

3. Liberal Education in Transition 
Clifton F. Conrad andJeaft C. Wyer 

4. Adult Development: Implications for Higher Education 
Rita Preszler Weathershy and J ill Mattnck Tanile 

5. A Question of Quality: The Higher Education Ratings Game 
Judith K. Lawrence and Kenneth C. Green 

6. Accreditation: History, Process, and Problems 
Fred F. Harcleroad 

7. Politics of Higher Education 
Edward R, Hines and LeifS, Hartmark 

8. Student Retention Strategics 

Oscar jT. Lemiing, Ken Saner, and Philip £. Deal 

9. The Financing of Public Higher Education: Low Tuition, Student 
Aid, and the Federal Government 

Jacob Stampen 

10. University Reform: An International Review 
Philip G. Altbach 
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